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Foreword 



I am delighted to write this foreword. This book, a reference where 
one can look up the details of most any algorithm to find a clear 
unambiguous description, has long been needed and here it finally is. 
A concise reference that has taken many hours to write but which has 
the capacity to save vast amounts of time previously spent digging out 
original papers. 

I have known the author for several years and have had experience 
of his amazing capacity for work and the sheer quality of his output, so 
this book comes as no surprise to mc. But I hope it will be a surprise 
and delight to you, the reader for whom it has been written. 

But useful as this book is, it is only a beginning. There are so many 
algorithms that no one author could hope to cover them all. So if you 
know of an algorithm that is not yet here, how about contributing it 
using the same clear and lucid style? 



Professor Tim Hendtlass 
Complex Intelligent Systems Laboratory 
Faculty of Information and Communication Technologies 
Swinburne University of Technology 



Melbourne, Australia 
2010 
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Preface 



About the book 

The need for this project was born of frustration while working to- 
wards my PhD. I was investigating optimization algorithms and was 
implementing a large number of them for a software platform called 
the Optimization Algorithm Toolkit (OAT) 1 . Each algorithm required 
considerable effort to locate the relevant source material (from books, 
papers, articles, and existing implementations), decipher and interpret 
the technique, and finally attempt to piece together a working imple- 
mentation. 

Taking a broader perspective, I realized that the communication of 
algorithmic techniques in the field of Artificial Intelligence was clearly a 
difficult and outstanding open problem. Generally, algorithm descrip- 
tions are: 

• Incomplete: many techniques are ambiguously described, partially 
described, or not described at all. 

• Inconsistent: a given technique may be described using a variety 
of formal and semi-formal methods that vary across different tech- 
niques, limiting the transferability of background skills an audience 
requires to read a technique (such as mathematics, pseudocode, 
program code, and narratives). An inconsistent representation for 
techniques means that the skills used to understand and internal- 
ize one technique may not be transferable to realizing different 
techniques or even extensions of the same technique. 

• Distributed: the description of data structures, operations, and 
parameterization of a given technique may span a collection of 
papers, articles, books, and source code published over a number 
of years, the access to which may be restricted and difficult to 
obtain. 

1 OAT located at http://optalgtoolkit.sourceforge.net 
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For the practitioner, a badly described algorithm may be simply 
frustrating, where the gaps in available information are filled with 
intuition and 'best guess'. At the other end of the spectrum, a badly 
described algorithm may be an example of bad science and the failure of 
the scientific method, where the inability to understand and implement 
a technique may prevent the replication of results, the application, or 
the investigation and extension of a technique. 

The software I produced provided a first step solution to this problem: 
a set of working algorithms implemented in a (somewhat) consistent way 
and downloaded from a single location (features likely provided by any 
library of artificial intelligence techniques) . The next logical step needed 
to address this problem is to develop a methodology that anybody can 
follow. The strategy to address the open problem of poor algorithm 
communication is to present complete algorithm descriptions (rather 
than just implementations) in a consistent manner, and in a centralized 
location. This book is the outcome of developing such a strategy that not 
only provides a methodology for standardized algorithm descriptions, but 
provides a large corpus of complete and consistent algorithm descriptions 
in a single centralized location. 

The algorithms described in this work are practical, interesting, and 
fun, and the goal of this project was to promote these features by making 
algorithms from the field more accessible, usable, and understandable. 
This project was developed over a number years through a lot of writing, 
discussion, and revision. This book has been released under a permissive 
license that encourages the reader to explore new and creative ways of 
further communicating its message and content. 

I hope that this project has succeeded in some small way and that you 
too can enjoy applying, learning, and playing with Clever Algorithms. 



Jason Brownlce 



Melbourne, Australia 
2011 
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Chapter 1 

Introduction 



Welcome to Clever Algorithms! This is a handbook of recipes for com- 
putational problem solving techniques from the fields of Computational 
Intelligence, Biologically Inspired Computation, and Metaheuristics. 
Clever Algorithms are interesting, practical, and fun to learn about and 
implement. Research scientists may be interested in browsing algorithm 
inspirations in search of an interesting system or process analogs to 
investigate. Developers and software engineers may compare various 
problem solving algorithms and technique-specific guidelines. Practition- 
ers, students, and interested amateurs may implement state-of-the-art 
algorithms to address business or scientific needs, or simply play with 
the fascinating systems they represent. 

This introductory chapter provides relevant background information 
on Artificial Intelligence and Algorithms. The core of the book provides 
a large corpus of algorithms presented in a complete and consistent 
manner. The final chapter covers some advanced topics to consider 
once a number of algorithms have been mastered. This book has been 
designed as a reference text, where specific techniques are looked up, or 
where the algorithms across whole fields of study can be browsed, rather 
than being read cover-to-cover. This book is an algorithm handbook 
and a technique guidebook, and I hope you find something useful. 

1.1 What is AI 

1.1.1 Artificial Intelligence 

The field of classical Artificial Intelligence (AI) coalesced in the 1950s 
drawing on an understanding of the brain from neuroscience, the new 
mathematics of information theory, control theory referred to as cyber- 
netics, and the dawn of the digital computer. AI is a cross-disciplinary 
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field of research that is generally concerned with developing and in- 
vestigating systems that operate or act intelligently. It is considered 
a discipline in the field of computer science given the strong focus on 
computation. 

Russell and Norvig provide a perspective that defines Artificial Intel- 
ligence in four categories: 1) systems that think like humans, 2) systems 
that act like humans, 3) systems that think rationally, 4) systems that 
act rationally [43] . In their definition, acting like a human suggests that 
a system can do some specific things humans can do, this includes fields 
such as the Turing test, natural language processing, automated reason- 
ing, knowledge representation, machine learning, computer vision, and 
robotics. Thinking like a human suggests systems that model the cogni- 
tive information processing properties of humans, for example a general 
problem solver and systems that build internal models of their world. 
Thinking rationally suggests laws of rationalism and structured thought, 
such as syllogisms and formal logic. Finally, acting rationally suggests 
systems that do rational things such as expected utility maximization 
and rational agents. 

Luger and Stubblefield suggest that AI is a sub-field of computer 
science concerned with the automation of intelligence, and like other 
sub-fields of computer science has both theoretical concerns (how and 
why do the systems work?) and application concerns (where and when 
can the systems be used?) [34]. They suggest a strong empirical focus to 
research, because although there may be a strong desire for mathematical 
analysis, the systems themselves defy analysis given their complexity. 
The machines and software investigated in AI are not black boxes, 
rather analysis proceeds by observing the systems interactions with their 
environments, followed by an internal assessment of the system to relate 
its structure back to its behavior. 

Artificial Intelligence is therefore concerned with investigating mech- 
anisms that underlie intelligence and intelligence behavior. The tradi- 
tional approach toward designing and investigating AI (the so-called 
'good old fashioned' AI) has been to employ a symbolic basis for these 
mechanisms. A newer approach historically referred to as scruffy artifi- 
cial intelligence or soft computing does not necessarily use a symbolic 
basis, instead patterning these mechanisms after biological or natural 
processes. This represents a modern paradigm shift in interest from sym- 
bolic knowledge representations, to inference strategies for adaptation 
and learning, and has been referred to as neat versus scruffy approaches 
to AI. The neat philosophy is concerned with formal symbolic models 
of intelligence that can explain why they work, whereas the scruffy 
philosophy is concerned with intelligent strategies that explain how they 
work [44]. 
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Neat AI 

The traditional stream of AI concerns a top down perspective of problem 
solving, generally involving symbolic representations and logic processes 
that most importantly can explain why the systems work. The successes 
of this prescriptive stream include a multitude of specialist approaches 
such as rule-based expert systems, automatic theorem provers, and oper- 
ations research techniques that underly modern planning and scheduling 
software. Although traditional approaches have resulted in significant 
success they have their limits, most notably scalability. Increases in 
problem size result in an unmanageable increase in the complexity of 
such problems meaning that although traditional techniques can guar- 
antee an optimal, precise, or true solution, the computational execution 
time or computing memory required can be intractable. 

Scruffy AI 

There have been a number of thrusts in the field of AI toward less crisp 
techniques that are able to locate approximate, imprecise, or partially- 
true solutions to problems with a reasonable cost of resources. Such 
approaches are typically descriptive rather than prescriptive, describing 
a process for achieving a solution (how), but not explaining why they 
work (like the neater approaches). 

Scruffy AI approaches are defined as relatively simple procedures that 
result in complex emergent and self-organizing behavior that can defy 
traditional reductionist analyses, the effects of which can be exploited 
for quickly locating approximate solutions to intractable problems. A 
common characteristic of such techniques is the incorporation of random- 
ness in their processes resulting in robust probabilistic and stochastic 
decision making contrasted to the sometimes more fragile determinism 
of the crisp approaches. Another important common attribute is the 
adoption of an inductive rather than deductive approach to problem solv- 
ing, generalizing solutions or decisions from sets of specific observations 
made by the system. 

1.1.2 Natural Computation 

An important perspective on scruffy Artificial Intelligence is the moti- 
vation and inspiration for the core information processing strategy of 
a given technique. Computers can only do what they are instructed, 
therefore a consideration is to distill information processing from other 
fields of study, such as the physical world and biology. The study of 
biologically motivated computation is called Biologically Inspired Com- 
puting [16], and is one of three related fields of Natural Computing 
[22, 23, 39]. Natural Computing is an interdisciplinary field concerned 
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with the relationship of computation and biology, which in addition to 
Biologically Inspired Computing is also comprised of Computationally 
Motivated Biology and Computing with Biology [36, 40]. 

Biologically Inspired Computation 

Biologically Inspired Computation is computation inspired by biological 
metaphor, also referred to as Biomimicry, and Biomemetics in other 
engineering disciplines [6, 17]. The intent of this field is to devise math- 
ematical and engineering tools to generate solutions to computation 
problems. The field involves using procedures for finding solutions ab- 
stracted from the natural world for addressing computationally phrased 
problems. 

Computationally Motivated Biology 

Computationally Motivated Biology involves investigating biology using 
computers. The intent of this area is to use information sciences and 
simulation to model biological systems in digital computers with the 
aim to replicate and better understand behaviors in biological systems. 
The field facilitates the ability to better understand life-as-it-is and 
investigate life-as-it-could-be. Typically, work in this sub-field is not 
concerned with the construction of mathematical and engineering tools, 
rather it is focused on simulating natural phenomena. Common examples 
include Artificial Life, Fractal Geometry (L-systems, Iterative Function 
Systems, Particle Systems, Brownian motion), and Cellular Automata. 
A related field is that of Computational Biology generally concerned with 
modeling biological systems and the application of statistical methods 
such as in the sub-field of Bioinformatics. 

Computation with Biology 

Computation with Biology is the investigation of substrates other than 
silicon in which to implement computation [1]. Common examples 
include molecular or DNA Computing and Quantum Computing. 

1.1.3 Computational Intelligence 

Computational Intelligence is a modern name for the sub-field of AI 
concerned with sub-symbolic (also called messy, scruffy, and soft) tech- 
niques. Computational Intelligence describes techniques that focus on 
strategy and outcome. The field broadly covers sub-disciplines that 
focus on adaptive and intelligence systems, not limited to: Evolutionary 
Computation, Swarm Intelligence (Particle Swarm and Ant Colony Op- 
timization), Fuzzy Systems, Artificial Immune Systems, and Artificial 
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Neural Networks [20, 41]. This section provides a brief summary of the 
each of the five primary areas of study. 

Evolutionary Computation 

A paradigm that is concerned with the investigation of systems inspired 
by the neo-Darwinian theory of evolution by means of natural selection 
(natural selection theory and an understanding of genetics). Popular 
evolutionary algorithms include the Genetic Algorithm, Evolution Strat- 
egy, Genetic and Evolutionary Programming, and Differential Evolution 
[4, 5]. The evolutionary process is considered an adaptive strategy and 
is typically applied to search and optimization domains [26, 28]. 

Swarm Intelligence 

A paradigm that considers collective intelligence as a behavior that 
emerges through the interaction and cooperation of large numbers 
of lesser intelligent agents. The paradigm consists of two dominant 
sub- fields 1) Ant Colony Optimization that investigates probabilistic 
algorithms inspired by the foraging behavior of ants [10, 18], and 2) 
Particle Swarm Optimization that investigates probabilistic algorithms 
inspired by the flocking and foraging behavior of birds and fish [30]. 
Like evolutionary computation, swarm intelligence-based techniques are 
considered adaptive strategies and are typically applied to search and 
optimization domains. 

Artificial Neural Networks 

Neural Networks are a paradigm that is concerned with the investigation 
of architectures and learning strategies inspired by the modeling of 
neurons in the brain [8]. Learning strategies are typically divided into 
supervised and unsupervised which manage environmental feedback 
in different ways. Neural network learning processes are considered 
adaptive learning and are typically applied to function approximation 
and pattern recognition domains. 

Fuzzy Intelligence 

Fuzzy Intelligence is a paradigm that is concerned with the investigation 
of fuzzy logic, which is a form of logic that is not constrained to true 
and false determinations like propositional logic, but rather functions 
which define approximate truth, or degrees of truth [52]. Fuzzy logic 
and fuzzy systems are a logic system used as a reasoning strategy and 
are typically applied to expert system and control system domains. 
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Artificial Immune Systems 

A collection of approaches inspired by the structure and function of the 
acquired immune system of vertebrates. Popular approaches include 
clonal selection, negative selection, the dendritic cell algorithm, and 
immune network algorithms. The immune-inspired adaptive processes 
vary in strategy and show similarities to the fields of Evolutionary 
Computation and Artificial Neural Networks, and are typically used for 
optimization and pattern recognition domains [15]. 

1.1.4 Metaheuristics 

Another popular name for the strategy-outcome perspective of scruffy AI 
is metaheuristics. In this context, heuristic is an algorithm that locates 
'good enough' solutions to a problem without concern for whether the 
solution can be proven to be correct or optimal [37]. Heuristic methods 
trade-off concerns such as precision, quality, and accuracy in favor of 
computational effort (space and time efficiency). The greedy search 
procedure that only takes cost-improving steps is an example of heuristic 
method. 

Like heuristics, metaheuristics may be considered a general algorith- 
mic framework that can be applied to different optimization problems 
with relative few modifications to adapt them to a specific problem 
[25, 46]. The difference is that metaheuristics are intended to extend the 
capabilities of heuristics by combining one or more heuristic methods 
(referred to as procedures) using a higher-level strategy (hence 'meta'). 
A procedure in a metaheuristic is considered black-box in that little (if 
any) prior knowledge is known about it by the metaheuristic, and as 
such it may be replaced with a different procedure. Procedures may 
be as simple as the manipulation of a representation, or as complex 
as another complete metaheuristic. Some examples of metaheuristics 
include iterated local search, tabu search, the genetic algorithm, ant 
colony optimization, and simulated annealing. 

Blum and Roli outline nine properties of metaheuristics [9], as follows: 

• Metaheuristics are strategies that "guide" the search process. 

• The goal is to efficiently explore the search space in order to find 
(near-)optimal solutions. 

• Techniques which constitute metaheuristic algorithms range from 
simple local search procedures to complex learning processes. 

• Metaheuristic algorithms are approximate and usually non-deterministic. 

• They may incorporate mechanisms to avoid getting trapped in 
confined areas of the search space. 
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• The basic concepts of metaheuristics permit an abstract level 
description. 

• Metaheuristics are not problem-specific. 

• Metaheuristics may make use of domain-specific knowledge in the 
form of heuristics that are controlled by the upper level strategy. 

• Today's more advanced metaheuristics use search experience (em- 
bodied in some form of memory) to guide the search. 

Hyperheuristics are yet another extension that focuses on heuristics 
that modify their parameters (online or offline) to improve the efficacy 
of solution, or the efficiency of the computation. Hyperheuristics provide 
high-level strategies that may employ machine learning and adapt their 
search behavior by modifying the application of the sub-procedures or 
even which procedures are used (operating on the space of heuristics 
which in turn operate within the problem domain) [12, 13]. 

1.1.5 Clever Algorithms 

This book is concerned with 'clever algorithms', which are algorithms 
drawn from many sub-fields of artificial intelligence not limited to 
the scruffy fields of biologically inspired computation, computational 
intelligence and metaheuristics. The term 1 clever algorithms' is intended 
to unify a collection of interesting and useful computational tools under 
a consistent and accessible banner. An alternative name (Inspired 
Algorithms) was considered, although ultimately rejected given that not 
all of the algorithms to be described in the project have an inspiration 
(specifically a biological or physical inspiration) for their computational 
strategy. The set of algorithms described in this book may generally be 
referred to as 'unconventional optimization algorithms' (for example, 
see [14]), as optimization is the main form of computation provided by 
the listed approaches. A technically more appropriate name for these 
approaches is stochastic global optimization (for example, see [49] and 
[35]). 

Algorithms were selected in order to provide a rich and interesting 
coverage of the fields of Biologically Inspired Computation, Metaheuris- 
tics and Computational Intelligence. Rather than a coverage of just 
the state-of-the-art and popular methods, the algorithms presented also 
include historic and newly described methods. The final selection was 
designed to provoke curiosity and encourage exploration and a wider 
view of the field. 
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1.2 Problem Domains 

Algorithms from the fields of Computational Intelligence, Biologically 
Inspired Computing, and Metaheuristics are applied to difficult problems, 
to which more traditional approaches may not be suited. Michalewicz 
and Fogel propose five reasons why problems may be difficult [37] (page 
11): 

• The number of possible solutions in the search space is so large as 
to forbid an exhaustive search for the best answer. 

• The problem is so complicated, that just to facilitate any answer 
at all, we have to use such simplified models of the problem that 
any result is essentially useless. 

• The evaluation function that describes the quality of any proposed 
solution is noisy or varies with time, thereby requiring not just a 
single solution but an entire series of solutions. 

• The possible solutions are so heavily constrained that constructing 
even one feasible answer is difficult, let alone searching for an 
optimal solution. 

• The person solving the problem is inadequately prepared or imag- 
ines some psychological barrier that prevents them from discovering 
a solution. 

This section introduces two problem formalisms that embody many 
of the most difficult problems faced by Artificial and Computational 
Intelligence. They are: Function Optimization and Function Approx- 
imation. Each class of problem is described in terms of its general 
properties, a formalism, and a set of specialized sub- problems. These 
problem classes provide a tangible framing of the algorithmic techniques 
described throughout the work. 

1.2.1 Function Optimization 

Real-world optimization problems and generalizations thereof can be 
drawn from most fields of science, engineering, and information technol- 
ogy (for a sample [2, 48]). Importantly, function optimization problems 
have had a long tradition in the fields of Artificial Intelligence in mo- 
tivating basic research into new problem solving techniques, and for 
investigating and verifying systemic behavior against benchmark prob- 
lem instances. 
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Problem Description 

Mathematically, optimization is defined as the search for a combina- 
tion of parameters commonly referred to as decision variables (x = 
{x\, X2, £3, ■ ■ ■ x n }) which minimize or maximize some ordinal quantity 
(c) (typically a scalar called a score or cost) assigned by an objec- 
tive function or cost function (/), under a set of constraints (g = 
{<7ij32,ff3, • ■ -5n})- For example, a general minimization case would be 
as follows: f{x') < f(x),Vxi € x. Constraints may provide boundaries 
on decision variables (for example in a real- value hypercube 3?™), or may 
generally define regions of feasibility and in-feasibility in the decision 
variable space. In applied mathematics the field may be referred to as 
Mathematical Programming. More generally the field may be referred 
to as Global or Function Optimization given the focus on the objective 
function. For more general information on optimization refer to Horst 
et al. [29]. 

Sub-Fields of Study 

The study of optimization is comprised of many specialized sub-fields, 
based on an overlapping taxonomy that focuses on the principle con- 
cerns in the general formalism. For example, with regard to the decision 
variables, one may consider univariate and multivariate optimization 
problems. The type of decision variables promotes specialities for con- 
tinuous, discrete, and permutations of variables. Dependencies between 
decision variables under a cost function define the fields of Linear Pro- 
gramming, Quadratic Programming, and Nonlinear Programming. A 
large class of optimization problems can be reduced to discrete sets 
and are considered in the field of Combinatorial Optimization, to which 
many theoretical properties are known, most importantly that many 
interesting and relevant problems cannot be solved by an approach with 
polynomial time complexity (so-called NP, for example see Papadim- 
itriou and Steiglitz [38]). 

The evaluation of variables against a cost function, collectively may 
be considered a response surface. The shape of such a response surface 
may be convex, which is a class of functions to which many important 
theoretical findings have been made, not limited to the fact that location 
of the local optimal configuration also means the global optimal con- 
figuration of decision variables has been located [11]. Many interesting 
and real-world optimization problems produce cost surfaces that are 
non-convex or so called multi-modal 1 (rather than unimodal) suggesting 
that there are multiple peaks and valleys. Further, many real- world 

1 Takcn from statistics referring to the centers of mass in distributions, although 
in optimization it refers to 'regions of interest' in the search space, in particular 
valleys in minimization, and peaks in maximization cost surfaces. 
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optimization problems with continuous decision variables cannot be 
differentiated given their complexity or limited information availability, 
meaning that derivative-based gradient descent methods (that are well 
understood) are not applicable, necessitating the use of so-called 'direct 
search' (sample or pattern-based) methods [33]. Real- world objective 
function evaluation may be noisy, discontinuous, and/or dynamic, and 
the constraints of real-world problem solving may require an approx- 
imate solution in limited time or resources, motivating the need for 
heuristic approaches. 

1.2.2 Function Approximation 

Real- world Function Approximation problems are among the most com- 
putationally difficult considered in the broader field of Artificial Intelli- 
gence for reasons including: incomplete information, high-dimensionality, 
noise in the sample observations, and non-linearities in the target func- 
tion. This section considers the Function Approximation formalism and 
related specializations as a general motivating problem to contrast and 
compare with Function Optimization. 

Problem Description 

Function Approximation is the problem of finding a function (/) that 
approximates a target function (g), where typically the approximated 
function is selected based on a sample of observations (x, also referred to 
as the training set) taken from the unknown target function. In machine 
learning, the function approximation formalism is used to describe 
general problem types commonly referred to as pattern recognition, 
such as classification, clustering, and curve fitting (called a decision 
or discrimination function). Such general problem types are described 
in terms of approximating an unknown Probability Density Function 
(PDF), which underlies the relationships in the problem space, and is 
represented in the sample data. This perspective of such problems is 
commonly referred to as statistical machine learning and/or density 
estimation [8, 24]. 

Sub-Fields of Study 

The function approximation formalism can be used to phrase some of the 
hardest problems faced by Computer Science, and Artificial Intelligence 
in particular, such as natural language processing and computer vision. 
The general process focuses on 1) the collection and preparation of the 
observations from the target function, 2) the selection and/or preparation 
of a model of the target function, and 3) the application and ongoing 
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refinement of the prepared model. Some important problem-based 
sub-fields include: 

• Feature Selection where a feature is considered an aggregation 
of one-or-more attributes, where only those features that have 
meaning in the context of the target function are necessary to the 
modeling function [27, 32]. 

• Classification where observations are inherently organized into 
labelled groups (classes) and a supervised process models an un- 
derlying discrimination function to classify unobserved samples. 

• Clustering where observations may be organized into groups based 
on underlying common features, although the groups are unlabeled 
requiring a process to model an underlying discrimination function 
without corrective feedback. 

• Curve or Surface Fitting where a model is prepared that provides a 
'best-fit' (called a regression) for a set of observations that may be 
used for interpolation over known observations and extrapolation 
for observations outside what has been modeled. 

The field of Function Optimization is related to Function Approx- 
imation, as many-sub-problems of Function Approximation may be 
defined as optimization problems. Many of the technique paradigms 
used for function approximation are differentiated based on the rep- 
resentation and the optimization process used to minimize error or 
maximize effectiveness on a given approximation problem. The difficulty 
of Function Approximation problems center around 1) the nature of the 
unknown relationships between attributes and features, 2) the number 
(dimensionality) of attributes and features, and 3) general concerns of 
noise in such relationships and the dynamic availability of samples from 
the target function. Additional difficulties include the incorporation of 
prior knowledge (such as imbalance in samples, incomplete information 
and the variable reliability of data), and problems of invariant features 
(such as transformation, translation, rotation, scaling, and skewing of 
features) . 

1.3 Unconventional Optimization 

Not all algorithms described in this book are for optimization, although 
those that are may be referred to as 'unconventional' to differentiate 
them from the more traditional approaches. Examples of traditional 
approaches include (but are not not limited) mathematical optimization 
algorithms (such as Newton's method and Gradient Descent that use 
derivatives to locate a local minimum) and direct search methods (such 
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as the Simplex method and the Nelder-Mead method that use a search 
pattern to locate optima). Unconventional optimization algorithms are 
designed for the more difficult problem instances, the attributes of which 
were introduced in Section 1.2.1. This section introduces some common 
attributes of this class of algorithm. 

1.3.1 Black Box Algorithms 

Black Box optimization algorithms are those that exploit little, if any, 
information from a problem domain in order to devise a solution. They 
are generalized problem solving procedures that may be applied to a 
range of problems with very little modification [19]. Domain specific 
knowledge refers to known relationships between solution representations 
and the objective cost function. Generally speaking, the less domain 
specific information incorporated into a technique, the more flexible 
the technique, although the less efficient it will be for a given problem. 
For example, 'random search' is the most general black box approach 
and is also the most flexible requiring only the generation of random 
solutions for a given problem. Random search allows resampling of 
the domain which gives it a worst case behavior that is worse than 
enumerating the entire search domain. In practice, the more prior 
knowledge available about a problem, the more information that can 
be exploited by a technique in order to efficiently locate a solution for 
the problem, heuristically or otherwise. Therefore, black box methods 
are those methods suitable for those problems where little information 
from the problem domain is available to be used by a problem solving 
approach. 

1.3.2 No-Free-Lunch 

The No-Free-Lunch Theorem of search and optimization by Wolpert 
and Macready proposes that all black box optimization algorithms 
are the same for searching for the extremum of a cost function when 
averaged over all possible functions [50, 51]. The theorem has caused 
a lot of pessimism and misunderstanding, particularly in relation to 
the evaluation and comparison of Metaheuristic and Computational 
Intelligence algorithms. 

The implication of the theorem is that searching for the 'best' general- 
purpose black box optimization algorithm is irresponsible as no such 
procedure is theoretically possible. No-Free-Lunch applies to stochastic 
and deterministic optimization algorithms as well as to algorithms that 
learn and adjust their search strategy over time. It is independent of 
the performance measure used and the representation selected. Wolpert 
and Macready's original paper was produced at a time when grandiose 
generalizations were being made as to algorithm, representation, or 
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configuration superiority. The practical impact of the theory is to 
encourage practitioners to bound claims of applicability for search and 
optimization algorithms. Wolpert and Macready encouraged effort 
be put into devising practical problem classes and into the matching 
of suitable algorithms to problem classes. Further, they compelled 
practitioners to exploit domain knowledge in optimization algorithm 
application, which is now an axiom in the field. 

1.3.3 Stochastic Optimization 

Stochastic optimization algorithms are those that use randomness to 
elicit non-deterministic behaviors, contrasted to purely deterministic 
procedures. Most algorithms from the fields of Computational Intelli- 
gence, Biologically Inspired Computation, and Metaheuristics may be 
considered to belong the field of Stochastic Optimization. Algorithms 
that exploit randomness are not random in behavior, rather they sample 
a problem space in a biased manner, focusing on areas of interest and 
neglecting less interesting areas [45] . A class of techniques that focus on 
the stochastic sampling of a domain, called Markov Chain Monte Carlo 
(MCMC) algorithms, provide good average performance, and generally 
offer a low chance of the worst case performance. Such approaches are 
suited to problems with many coupled degrees of freedom, for example 
large, high-dimensional spaces. MCMC approaches involve stochastically 
sampling from a target distribution function similar to Monte Carlo 
simulation methods using a process that resembles a biased Markov 
chain. 

• Monte Carlo methods are used for selecting a statistical sample 
to approximate a given target probability density function and 
are traditionally used in statistical physics. Samples are drawn 
sequentially and the process may include criteria for rejecting sam- 
ples and biasing the sampling locations within high-dimensional 
spaces. 

• Markov Chain processes provide a probabilistic model for state 
transitions or moves within a discrete domain called a walk or a 
chain of steps. A Markov system is only dependent on the current 
position in the domain in order to probabilistically determine the 
next step in the walk. 

MCMC techniques combine these two approaches to solve integration 
and optimization problems in large dimensional spaces by generating 
samples while exploring the space using a Markov chain process, rather 
than sequentially or independently [3] . The step generation is configured 
to bias sampling in more important regions of the domain. Three exam- 
ples of MCMC techniques include the Metropolis-Hastings algorithm, 
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Simulated Annealing for global optimization, and the Gibbs sampler 
which are commonly employed in the fields of physics, chemistry, statis- 
tics, and economics. 

1.3.4 Inductive Learning 

Many unconventional optimization algorithms employ a process that 
includes the iterative improvement of candidate solutions against an ob- 
jective cost function. This process of adaptation is generally a method by 
which the process obtains characteristics that improve the system's (can- 
didate solution) relative performance in an environment (cost function). 
This adaptive behavior is commonly achieved through a 'selectionist 
process' of repetition of the steps: generation, test, and selection. The 
use of non-deterministic processes mean that the sampling of the domain 
(the generation step) is typically non-parametric, although guided by 
past experience. 

The method of acquiring information is called inductive learning or 
learning from example, where the approach uses the implicit assumption 
that specific examples are representative of the broader information 
content of the environment, specifically with regard to anticipated 
need. Many unconventional optimization approaches maintain a single 
candidate solution, a population of samples, or a compression thereof that 
provides both an instantaneous representation of all of the information 
acquired by the process, and the basis for generating and making future 
decisions. 

This method of simultaneously acquiring and improving information 
from the domain and the optimization of decision making (where to 
direct future effort) is called the /c-armed bandit (two-armed and multi- 
armed bandit) problem from the field of statistical decision making 
known as game theory [7, 42]. This formalism considers the capability 
of a strategy to allocate available resources proportional to the future 
payoff the strategy is expected to receive. The classic example is the 
2-armed bandit problem used by Goldberg to describe the behavior of 
the genetic algorithm [26] . The example involves an agent that learns 
which one of the two slot machines provides more return by pulling the 
handle of each (sampling the domain) and biasing future handle pulls 
proportional to the expected utility, based on the probabilistic experience 
with the past distribution of the payoff. The formalism may also be 
used to understand the properties of inductive learning demonstrated by 
the adaptive behavior of most unconventional optimization algorithms. 

The stochastic iterative process of generate and test can be com- 
putationally wasteful, potentially re-searching areas of the problem 
space already searched, and requiring many trials or samples in order 
to achieve a 'good enough' solution. The limited use of prior knowl- 
edge from the domain (black box) coupled with the stochastic sampling 
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process mean that the adapted solutions are created without top-down 
insight or instruction can sometimes be interesting, innovative, and even 
competitive with decades of human expertise [31]. 

1.4 Book Organization 

The remainder of this book is organized into two parts: Algorithms that 
describes a large number of techniques in a complete and a consistent 
manner presented in a rough algorithm groups, and Extensions that 
reviews more advanced topics suitable for when a number of algorithms 
have been mastered. 

1.4.1 Algorithms 

Algorithms are presented in six groups or kingdoms distilled from the 
broader fields of study each in their own chapter, as follows: 

• Stochastic Algorithms that focuses on the introduction of random- 
ness into heuristic methods (Chapter 2). 

• Evolutionary Algorithms inspired by evolution by means of natural 
selection (Chapter 3). 

• Physical Algorithms inspired by physical and social systems (Chap- 
ter 4). 

• Probabilistic Algorithms that focuses on methods that build models 
and estimate distributions in search domains (Chapter 5). 

• Swarm Algorithms that focuses on methods that exploit the prop- 
erties of collective intelligence (Chapter 6). 

• Immune Algorithms inspired by the adaptive immune system of 
vertebrates (Chapter 7). 

• Neural Algorithms inspired by the plasticity and learning qualities 
of the human nervous system (Chapter 8). 

A given algorithm is more than just a procedure or code listing, each 
approach is an island of research. The meta-information that define the 
context of a technique is just as important to understanding and applica- 
tion as abstract recipes and concrete implementations. A standardized 
algorithm description is adopted to provide a consistent presentation of 
algorithms with a mixture of softer narrative descriptions, programmatic 
descriptions both abstract and concrete, and most importantly useful 
sources for finding out more information about the technique. 

The standardized algorithm description template covers the following 
subjects: 
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Name: The algorithm name defines the canonical name used to 
refer to the technique, in addition to common aliases, abbreviations, 
and acronyms. The name is used as the heading of an algorithm 
description. 

Taxonomy: The algorithm taxonomy defines where a technique 
fits into the field, both the specific sub-fields of Computational 
Intelligence and Biologically Inspired Computation as well as the 
broader field of Artificial Intelligence. The taxonomy also provides 
a context for determining the relationships between algorithms. 

Inspiration: (where appropriate) The inspiration describes the 
specific system or process that provoked the inception of the 
algorithm. The inspiring system may non-exclusivcly be natural, 
biological, physical, or social. The description of the inspiring 
system may include relevant domain specific theory, observation, 
nomenclature, and those salient attributes of the system that are 
somehow abstractly or conceptually manifest in the technique. 

Metaphor: (where appropriate) The metaphor is a description of 
the technique in the context of the inspiring system or a different 
suitable system. The features of the technique are made apparent 
through an analogous description of the features of the inspiring 
system. The explanation through analogy is not expected to be 
literal, rather the method is used as an allegorical communication 
tool. The inspiring system is not explicitly described, this is the 
role of the 'inspiration' topic, which represents a loose dependency 
for this topic. 

Strategy: The strategy is an abstract description of the computa- 
tional model. The strategy describes the information processing 
actions a technique shall take in order to achieve an objective, 
providing a logical separation between a computational realiza- 
tion (procedure) and an analogous system (metaphor). A given 
problem solving strategy may be realized as one of a number of 
specific algorithms or problem solving systems. 

Procedure: The algorithmic procedure summarizes the specifics of 
realizing a strategy as a systemized and parameterized computa- 
tion. It outlines how the algorithm is organized in terms of the 
computation, data structures, and representations. 

Heuristics: The heuristics section describes the commonsense, best 
practice, and demonstrated rules for applying and configuring a 
parameterized algorithm. The heuristics relate to the technical 
details of the technique's procedure and data structures for general 
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classes of application (neither specific implementations nor specific 
problem instances). 

• Code Listing: The code listing description provides a minimal but 
functional version of the technique implemented with a program- 
ming language. The code description can be typed into a computer 
and provide a working execution of the technique. The technique 
implementation also includes a minimal problem instance to which 
it is applied, and both the problem and algorithm implementations 
are complete enough to demonstrate the techniques procedure. 
The description is presented as a programming source code listing 
with a terse introductory summary. 

• References: The references section includes a listing of both pri- 
mary sources of information about the technique as well as useful 
introductory sources for novices to gain a deeper understanding 
of the theory and application of the technique. The description 
consists of hand-selected reference material including books, peer 
reviewed conference papers, and journal articles. 

Source code examples are included in the algorithm descriptions, and 
the Ruby Programming Language was selected for use throughout the 
book. Ruby was selected because it supports the procedural program- 
ming paradigm, adopted to ensure that examples can be easily ported 
to object-oriented and other paradigms. Additionally, Ruby is an inter- 
preted language, meaning the code can be directly executed without an 
introduced compilation step, and it is free to download and use from the 
Internet. 2 Ruby is concise, expressive, and supports meta-programming 
features that improve the readability of code examples. 

The sample code provides a working version of a given technique for 
demonstration purposes. Having a tinker with a technique can really 
bring it to life and provide valuable insight into a method. The sample 
code is a minimum implementation, providing plenty of opportunity to 
explore, extend and optimize. All of the source code for the algorithms 
presented in this book is available from the companion website, online at 
http://www.CleverAlgorithms.com. All algorithm implementations 
were tested with Ruby 1.8.6, 1.8.7 and 1.9. 

1.4.2 Extensions 

There are some some advanced topics that cannot be meaningfully 
considered until one has a firm grasp of a number of algorithms, and 
these are discussed at the back of the book. The Advanced Topics chapter 
addresses topics such as: the use of alternative programming paradigms 



2 Ruby can be downloaded for free from http://www.ruby-lang.org 
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when implementing clever algorithms, methodologies used when devising 
entirely new approaches, strategies to consider when testing clever 
algorithms, visualizing the behavior and results of algorithms, and 
comparing algorithms based on the results they produce using statistical 
methods. Like the background information provided in this chapter, the 
extensions provide a gentle introduction and starting point into some 
advanced topics, and references for seeking a deeper understanding. 



1.5 How to Read this Book 

This book is a reference text that provides a large compendium of algo- 
rithm descriptions. It is a trusted handbook of practical computational 
recipes to be consulted when one is confronted with difficult function 
optimization and approximation problems. It is also an encompass- 
ing guidebook of modern heuristic methods that may be browsed for 
inspiration, exploration, and general interest. 

The audience for this work may be interested in the fields of Com- 
putational Intelligence, Biologically Inspired Computation, and Meta- 
heuristics and may count themselves as belonging to one of the following 
broader groups: 

• Scientists: Research scientists concerned with theoretically or 
empirically investigating algorithms, addressing questions such as: 
What is the motivating system and strategy for a given technique ? 
What are some algorithms that may be used in a comparison within 
a given sub field or across subfields? 

• Engineers: Programmers and developers concerned with imple- 
menting, applying, or maintaining algorithms, addressing questions 
such as: What is the procedure for a given technique? What are 
the best practice heuristics for employing a given technique ? 

• Students: Undergraduate and graduate students interested in 
learning about techniques, addressing questions such as: What are 
some interesting algorithms to study? How to implement a given 
approach? 

• Amateurs: Practitioners interested in knowing more about algo- 
rithms, addressing questions such as: What classes of techniques 
exist and what algorithms do they provide? How to conceptualize 
the computation of a technique ? 
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1.6 Further Reading 

This book is not an introduction to Artificial Intelligence or related 
sub-fields, nor is it a field guide for a specific class of algorithms. This 
section provides some pointers to selected books and articles for those 
readers seeking a deeper understanding of the fields of study to which 
the Clever Algorithms described in this book belong. 

1.6.1 Artificial Intelligence 

Artificial Intelligence is large field of study and many excellent texts have 
been written to introduce the subject. Russell and Novig's "Artificial 
Intelligence: A Modern Approach" is an excellent introductory text 
providing a broad and deep review of what the field has to offer and is 
useful for students and practitioners alike [43]. Luger and Stubblefield's 
11 Artificial Intelligence: Structures and Strategies for Complex Problem 
Solving" is also an excellent reference text, providing a more empirical 
approach to the field [34]. 

1.6.2 Computational Intelligence 

Introductory books for the field of Computational Intelligence gen- 
erally focus on a handful of specific sub-fields and their techniques. 
Engelbrecht's 11 Computational Intelligence: An Introduction" provides a 
modern and detailed introduction to the field covering classic subjects 
such as Evolutionary Computation and Artificial Neural Networks, as 
well as more recent techniques such as Swarm Intelligence and Artificial 
Immune Systems [20]. Pedrycz's slightly more dated "Computational 
Intelligence: An Introduction" also provides a solid coverage of the core 
of the field with some deeper insights into fuzzy logic and fuzzy systems 
[41]. 

1.6.3 Biologically Inspired Computation 

Computational methods inspired by natural and biologically systems 
represent a large portion of the algorithms described in this book. The 
collection of articles published in de Castro and Von Zuben's "Recent 
Developments in Riologically Inspired Computing" provides an overview 
of the state of the field, and the introductory chapter on need for 
such methods does an excellent job to motivate the field of study [17]. 
Forbes's "Imitation of Life: How Biology Is Inspiring Computing" sets 
the scene for Natural Computing and the interrelated disciplines, of 
which Biologically Inspired Computing is but one useful example [22] . 
Finally, Benyus's "Biomimicry: Innovation Inspired by Nature" provides 
a good introduction into the broader related field of a new frontier in 
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science and technology that involves building systems inspired by an 
understanding of the biological world [6] . 

1.6.4 Metaheuristics 

The field of Metaheuristics was initially constrained to heuristics for 
applying classical optimization procedures, although has expanded to 
encompass a broader and diverse set of techniques. Michalewicz and 
Fogel's "How to Solve It: Modern Heuristics" provides a practical 
tour of heuristic methods with a consistent set of worked examples 
[37]. Glover and Kochenberger's 11 Handbook of Metaheuristics" provides 
a solid introduction into a broad collection of techniques and their 
capabilities [25]. 

1.6.5 The Ruby Programming Language 

The Ruby Programming Language is a multi-paradigm dynamic lan- 
guage that appeared in approximately 1995. Its meta-programming 
capabilities coupled with concise and readable syntax have made it a 
popular language of choice for web development, scripting, and applica- 
tion development. The classic reference text for the language is Thomas, 
Fowler, and Hunt's " Programming Ruby: The Pragmatic Programmers' 
Guide" referred to as the 'pickaxe book' because of the picture of the 
pickaxe on the cover [47]. An updated edition is available that covers 
version 1.9 (compared to 1.8 in the cited version) that will work just 
as well for use as a reference for the examples in this book. Flanagan 
and Matsumoto's "The Ruby Programming Language" also provides a 
seminal reference text with contributions from Yukihiro Matsumoto, 
the author of the language [21]. For more information on the Ruby 
Programming Language, see the quick-start guide in Appendix A. 
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Chapter 2 

Stochastic Algorithms 



2.1 Overview 

This chapter describes Stochastic Algorithms. 
2.1.1 Stochastic Optimization 

The majority of the algorithms to be described in this book are com- 
prised of probabilistic and stochastic processes. What differentiates the 
'stochastic algorithms' in this chapter from the remaining algorithms 
is the specific lack of 1) an inspiring system, and 2) a metaphorical 
explanation. Both 'inspiration' and 'metaphor' refer to the descriptive 
elements in the standardized algorithm description. 

These described algorithms are predominately global optimization 
algorithms and metaheuristics that manage the application of an em- 
bedded neighborhood exploring (local) search procedure. As such, with 
the exception of 'Stochastic Hill Climbing' and 'Random Search' the 
algorithms may be considered extensions of the multi-start search (also 
known as multi-restart search). This set of algorithms provide various 
different strategies by which 'better' and varied starting points can 
be generated and issued to a neighborhood searching technique for 
refinement, a process that is repeated with potentially improving or 
unexplored areas to search. 
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2.2 Random Search 



Random Search, RS, Blind Random Search, Blind Search, Pure Random 
Search, PRS 



2.2.1 Taxonomy 

Random search belongs to the fields of Stochastic Optimization and 
Global Optimization. Random search is a direct search method as it 
does not require derivatives to search a continuous domain. This base 
approach is related to techniques that provide small improvements such 
as Directed Random Search, and Adaptive Random Search (Section 2.3). 



2.2.2 Strategy 

The strategy of Random Search is to sample solutions from across the 
entire search space using a uniform probability distribution. Each future 
sample is independent of the samples that come before it. 



2.2.3 Procedure 

Algorithm 2.2.1 provides a pseudocode listing of the Random Search 
Algorithm for minimizing a cost function. 



Algorithm 2.2.1: Pseudocode for Random Search. 
Input: Numlterations, ProblemSize, SearchSpace 
Output: Best 

1 Best «- 0; 

2 foreach iteri E Numlterations do 

3 candidatei 4— RandomSolution(ProblemSize, SearchSpace); 

4 if Cost (candidate^ < Cost (Best) then 

5 | Best <— candidatei] 

6 end 

7 end 

8 return Best; 



2.2.4 Heuristics 

• Random search is minimal in that it only requires a candidate 
solution construction routine and a candidate solution evaluation 
routine, both of which may be calibrated using the approach. 
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• The worst case performance for Random Search for locating the 
optima is worse than an Enumeration of the search domain, given 
that Random Search has no memory and can blindly resample. 

• Random Search can return a reasonable approximation of the 
optimal solution within a reasonable time under low problem 
dimensionality, although the approach does not scale well with 
problem size (such as the number of dimensions) . 

• Care must be taken with some problem domains to ensure that 
random candidate solution construction is unbiased 

• The results of a Random Search can be used to seed another search 
technique, like a local search technique (such as the Hill Climbing 
algorithm) that can be used to locate the best solution in the 
neighborhood of the 'good' candidate solution. 

2.2.5 Code Listing 

Listing 2.1 provides an example of the Random Search Algorithm im- 
plemented in the Ruby Programming Language. In the example, the 
algorithm runs for a fixed number of iterations and returns the best can- 
didate solution discovered. The example problem is an instance of a con- 
tinuous function optimization that seeks min/(cc) where / = Y^7=i x 1i 
—5.0 < Xi < 5.0 and n = 2. The optimal solution for this basin function 
is (v 0 , . . -,V n -i) = 0.0. 

def objective_f unction(vector) 

return vector . inject (0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array .new(miiimax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def search(search_space , max_iter) 
best = nil 

max_iter .times do literl 
candidate = O 

candidate [: vector] = random_vector (search_space) 
candidate [: cost] = objective_f unction(candidate [: vector] ) 
best = candidate if best. nil? or candidate [: cost] < best [: cost] 
puts " > iteration=#{(iter+l)}, best=#{best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
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problem_size = 2 

search_space = Array. new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_iter = 100 

# execute the algorithm 

best = search(search_space , max_iter) 
puts "Done. Best Solution: c=#{best [ : cost] } , 
v=#{best [: vector] .inspect}" 

end 

Listing 2.1: Random Search in Ruby 



2.2.6 References 
Primary Sources 

There is no seminal specification of the Random Search algorithm, rather 
there are discussions of the general approach and related random search 
methods from the 1950s through to the 1970s. This was around the time 
that pattern and direct search methods were actively researched. Brooks 
is credited with the so-called 'pure random search' [1]. Two seminal 
reviews of 'random search methods' of the time include: Karnopp [2] 
and perhaps Kul'chitskii [3]. 

Learn More 

For overviews of Random Search Methods see Zhigljavsky [9], Solis and 
Wets [4], and also White [7] who provide an insightful review article. 
Spall provides a detailed overview of the field of Stochastic Optimization, 
including the Random Search method [5] (for example, see Chapter 2). 
For a shorter introduction by Spall, see [6] (specifically Section 6.2). 
Also see Zabinsky for another detailed review of the broader field [8] . 
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2.3 Adaptive Random Search 

Adaptive Random Search, ARS, Adaptive Step Size Random Search, 
ASSRS, Variable Step-Size Random Search. 

2.3.1 Taxonomy 

The Adaptive Random Search algorithm belongs to the general set of 
approaches known as Stochastic Optimization and Global Optimization. 
It is a direct search method in that it does not require derivatives to 
navigate the search space. Adaptive Random Search is an extension 
of the Random Search (Section 2.2) and Localized Random Search 
algorithms. 

2.3.2 Strategy 

The Adaptive Random Search algorithm was designed to address the 
limitations of the fixed step size in the Localized Random Search al- 
gorithm. The strategy for Adaptive Random Search is to continually 
approximate the optimal step size required to reach the global optimum 
in the search space. This is achieved by trialling and adopting smaller 
or larger step sizes only if they result in an improvement in the search 
performance. 

The Strategy of the Adaptive Step Size Random Search algorithm 
(the specific technique reviewed) is to trial a larger step in each iteration 
and adopt the larger step if it results in an improved result. Very large 
step sizes are trialled in the same manner although with a much lower 
frequency. This strategy of preferring large moves is intended to allow 
the technique to escape local optima. Smaller step sizes are adopted if 
no improvement is made for an extended period. 

2.3.3 Procedure 

Algorithm 2.3.1 provides a pseudocode listing of the Adaptive Random 
Search Algorithm for minimizing a cost function based on the specifica- 
tion for 'Adaptive Step-Size Random Search' by Schummer and Steiglitz 
[6]. 

2.3.4 Heuristics 

• Adaptive Random Search was designed for continuous function 
optimization problem domains. 

• Candidates with equal cost should be considered improvements 
to allow the algorithm to make progress across plateaus in the 
response surface. 
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Algorithm 2.3.1: Pseudocode for Adaptive Random Search. 



Problem s 



= , SearchSpace, StepSize) mt 



Input: Iter 

raaxi 1 ' wtt/^ Si2e , jcdn-iu^d^c, u ucjju L4iC - factor ' 

StepSize^l, StepSizeY a ? t e orl StepSizef^, 
NoChange ma x 
Output: S 

1 NoChange count 4- 0; 

2 StepSizei <— InitializeStepSize (SearchSpace, 



StepSize 



init 
factor 



3 Sf- RandomSolution(ProWer7j size; SearchSpace); 

4 for z = 0 to Iter max do 
Si «- TakeStep (SearchSpace, S, StepSizeO; 



StepSize 



large 



0; 



if i modStepSize Z f ^ tor then 



StepSize[ ar9e <- StepSize, x StepSize l ^ t e or ] 



else 



StepSize\ ar9e <- StepSizei x StepSize}™ 1 ^; 



end 

S 2 <- 



TakeStep (SearchSpace, S, StepSize 1 ^ 9 ") 



if Cost(Si)<Cost(S) - - Cost(S 2 )<Cost(S) then 
if Cost(S 2 )<Cost(Si) then 

S^S 2 ; 

StepSize, <- SiepS^e' Qrse ; 
else 

S^Si; 
end 

NoChange count <- 0; 
else 

NoChange count <- NoChange count + 1; 
if NoChange count > NoChange max then 
NoChange coun t <- 0; 
StepSize, ' 5tepS " e ' ■ 

end 
end 



StepSize^ 



28 end 

29 return S; 
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• Adaptive Random Search may adapt the search direction in addi- 
tion to the step size. 

• The step size may be adapted for all parameters, or for each 
parameter individually. 

2.3.5 Code Listing 

Listing 2.2 provides an example of the Adaptive Random Search Al- 
gorithm implemented in the Ruby Programming Language, based on 
the specification for 'Adaptive Step-Size Random Search' by Schummer 
and Steiglitz [6]. In the example, the algorithm runs for a fixed number 
of iterations and returns the best candidate solution discovered. The 
example problem is an instance of a continuous function optimization 
that seeks min/(x) where / = J27=i x i> ~5-0 < Xi < 5.0 and n = 2. 
The optimal solution for this basin function is (vq, . . . , v„-i) = 0.0. 

def objective_f unction(vector) 

return vector . inject (0) {|sum, x| sum + (x ** 2.0)} 
end 

def rand_in_bounds(min, max) 

return min + ((max-min) * randO) 
end 

def random_vector (minmax) 

return Array .new(minmax . size) do |i| 

rand_in_bounds (minmax [i] [0] , minmax [i] [1]) 

end 
end 

def take_step (minmax, current, step_size) 
position = Array .new(current . size) 
position . size . times do |i| 

min = [minmax [i] [0] , current [i] -step_size] .max 

max = [minmax [i] [1] , current [i] +step_size] .min 

position [i] = rand_in_bounds(min, max) 
end 

return position 
end 

def large_step_size(iter, step_size, s_factor, l_factor, iter_mult) 
return step_size * l_factor if iter>0 and iter .modulo(iter_mult) == 0 
return step_size * s_factor 

end 

def take_steps (bounds, current, step_size, big_stepsize) 
step, big_step = {}, O 

step [: vector] = take_step(bounds , current [: vector] , step_size) 
step [: cost] = obj ective_f unction(step [: vector] ) 

big_step [: vector] = take_step (bounds , current [: vector] ,big_stepsize) 
big_step [ : cost] = objective_f unction(big_step [: vector] ) 
return step, big_step 
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end 

def search (max_iter , bounds, init_factor, s_factor, l_factor, 
iter_mult, max_no_impr) 
step_size = (bounds [0] [1] -bounds [0] [0] ) * init_f actor 
current, count = {}, 0 

current [: vector] = random_vector (bounds) 

current [: cost] = objective.! unction (current [: vector] ) 

max_iter .times do literl 

big_stepsize = large_step_size(iter , step_size, s_factor, l_factor, 
iter_mult) 

step, big_step = take_steps (bounds , current, step_size, 
big_stepsize) 

if step [: cost] <= current [: cost] or big_step [ : cost] <= 
current [ : cost] 
if big_step [ : cost] <= step [: cost] 

step_size, current = big_stepsize , big_step 
else 

current = step 
end 

count = 0 
else 

count += 1 

count, step_size = 0, (step_size/s_f actor) if count >= max_no_impr 
end 

puts " > iteration #{(iter+l)}, best=#{current [ : cost] }" 
end 

return current 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

bounds = Array .new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_iter = 1000 

init_f actor = 0.05 
s_f actor = 1.3 
l_f actor =3.0 
iter_mult = 10 
max_no_impr = 30 

# execute the algorithm 

best = search (max_iter, bounds, init_factor, s_factor, l_factor, 

iter_mult, max_no_impr) 
puts "Done. Best Solution: c=#{best [ : cost] } , 

v=#{best [: vector] .inspect}" 

end 



Listing 2.2: Adaptive Random Search in Ruby 
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2.3.6 References 
Primary Sources 

Many works in the 1960s and 1970s experimented with variable step sizes 
for Random Search methods. Schummer and Steiglitz are commonly 
credited the adaptive step size procedure, which they called 'Adaptive 
Step-Size Random Search' [6]. Their approach only modifies the step 
size based on an approximation of the optimal step size required to reach 
the global optima. Kregting and White review adaptive random search 
methods and propose an approach called 'Adaptive Directional Random 
Search' that modifies both the algorithms step size and direction in 
response to the cost function [2] . 

Learn More 

White reviews extensions to Rastrigin's 'Creeping Random Search' [4] 
(fixed step size) that use probabilistic step sizes drawn stochastically 
from uniform and probabilistic distributions [7]. White also reviews 
works that propose dynamic control strategies for the step size, such as 
Karnopp [1] who proposes increases and decreases to the step size based 
on performance over very small numbers of trials. Schrack and Choit 
review random search methods that modify their step size in order to 
approximate optimal moves while searching, including the property of 
reversal [5]. Masri et al. describe an adaptive random search strategy 
that alternates between periods of fixed and variable step sizes [3] . 
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2.4 Stochastic Hill Climbing 

Stochastic Hill Climbing, SHC, Random Hill Climbing, RHC, Random 
Mutation Hill Climbing, RMHC 

2.4.1 Taxonomy 

The Stochastic Hill Climbing algorithm is a Stochastic Optimization 
algorithm and is a Local Optimization algorithm (contrasted to Global 
Optimization). It is a direct search technique, as it does not require 
derivatives of the search space. Stochastic Hill Climbing is an extension 
of deterministic hill climbing algorithms such as Simple Hill Climbing 
(first-best neighbor), Steepest- Ascent Hill Climbing (best neighbor), and 
a parent of approaches such as Parallel Hill Climbing and Random- 
Restart Hill Climbing. 

2.4.2 Strategy 

The strategy of the Stochastic Hill Climbing algorithm is iterate the 
process of randomly selecting a neighbor for a candidate solution and 
only accept it if it results in an improvement. The strategy was proposed 
to address the limitations of deterministic hill climbing techniques that 
were likely to get stuck in local optima due to their greedy acceptance 
of neighboring moves. 

2.4.3 Procedure 

Algorithm 2.4.1 provides a pseudocode listing of the Stochastic Hill 
Climbing algorithm for minimizing a cost function, specifically the 
Random Mutation Hill Climbing algorithm described by Forrest and 
Mitchell applied to a maximization optimization problem [3] . 



Algorithm 2.4.1: Pseudocode for Stochastic Hill Climbing. 
Input: Iter max , ProblemSize 
Output: Current 

1 Current <— RandomSolution(ProblemSize) ; 

2 foreach iteri E Iter max do 

3 Candidate <— RandomNeighbor (Current) ; 

4 if Cost (Candidate) > Cost (Current) then 

5 | Current ■<— Candidate; 

6 end 

7 end 

8 return Current; 
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2.4.4 Heuristics 

• Stochastic Hill Climbing was designed to be used in discrete 
domains with explicit neighbors such as combinatorial optimization 
(compared to continuous function optimization). 

• The algorithm's strategy may be applied to continuous domains 
by making use of a step-size to define candidate-solution neighbors 
(such as Localized Random Search and Fixed Step-Size Random 
Search). 

• Stochastic Hill Climbing is a local search technique (compared 
to global search) and may be used to refine a result after the 
execution of a global search algorithm. 

• Even though the technique uses a stochastic process, it can still 
get stuck in local optima. 

• Neighbors with better or equal cost should be accepted, allowing 
the technique to navigate across plateaus in the response surface. 

• The algorithm can be restarted and repeated a number of times 
after it converges to provide an improved result (called Multiple 
Restart Hill Climbing). 

• The procedure can be applied to multiple candidate solutions 
concurrently, allowing multiple algorithm runs to be performed at 
the same time (called Parallel Hill Climbing). 

2.4.5 Code Listing 

Listing 2.3 provides an example of the Stochastic Hill Climbing algo- 
rithm implemented in the Ruby Programming Language, specifically 
the Random Mutation Hill Climbing algorithm described by Forrest and 
Mitchell [3]. The algorithm is executed for a fixed number of iterations 
and is applied to a binary string optimization problem called 'One Max'. 
The objective of this maximization problem is to prepare a string of all 
'1' bits, where the cost function only reports the number of bits in a 
given string. 

def onemax (vector) 

return vector . inject (0 . 0){ I sum, v| sum + ((v=="l") ? 1 : 0)} 
end 

def random_bitstring(num_bits) 

return Array .new(num_bits) { I i I (rand<0.5) ? "1" : "0"} 
end 

def random_neighbor (bitstring) 
mutant = Array. new (bitstring) 
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pos = rand(bitstring. size) 

mutant [pos] = (mutant [pos] ==' 1 ' ) ? '0' : '1' 
return mutant 
end 

def search (max_iterations , num_bits) 
candidate = {} 

candidate [: vector] = random_bitstring(num_bits) 
candidate [: cost] = onemax( candidate [: vector] ) 
max_iterations .times do I iter I 
neighbor = {} 

neighbor [: vector] = random_neighbor (candidate [: vector] ) 
neighbor [: cost] = onemax (neighbor [: vector] ) 
candidate = neighbor if neighbor [: cost] >= candidate [: cost] 
puts " > iteration #{ (iter+1) } , best=#{candidate [ : cost] }" 
break if candidate [: cost] == num_bits 
end 

return candidate 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 64 

# algorithm configuration 
max_iterations = 1000 

# execute the algorithm 

best = search (max_iterations , num_bits) 

puts "Done. Best Solution: c=#{best [ : cost] } , v=#{best [: vector] .join}" 
end 



Listing 2.3: Stochastic Hill Climbing in Ruby 



2.4.6 References 
Primary Sources 

Perhaps the most popular implementation of the Stochastic Hill Climb- 
ing algorithm is by Forrest and Mitchell, who proposed the Random 
Mutation Hill Climbing (RMHC) algorithm (with communication from 
Richard Palmer) in a study that investigated the behavior of the ge- 
netic algorithm on a deceptive class of (discrete) bit-string optimization 
problems called 'royal road' functions [3]. The RMHC was compared to 
two other hill climbing algorithms in addition to the genetic algorithm, 
specifically: the Steepest-Ascent Hill Climber, and the Next-Ascent Hill 
Climber. This study was then followed up by Mitchell and Holland [5]. 

Jules and Wattenberg were also early to consider stochastic hill 
climbing as an approach to compare to the genetic algorithm [4] . Skalak 
applied the RMHC algorithm to a single long bit-string that represented 
a number of prototype vectors for use in classification [8]. 
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Learn More 

The Stochastic Hill Climbing algorithm is related to the genetic algorithm 
without crossover. Simplified version's of the approach are investigated 
for bit-string based optimization problems with the population size of 
the genetic algorithm reduced to one. The general technique has been 
investigated under the names Iterated Hillclimbing [6], ES(l+l,m,hc) 
[7], Random Bit Climber [2], and (1+1)-Genetic Algorithm [1]. This 
main difference between RMHC and ES(1+1) is that the latter uses a 
fixed probability of a mutation for each discrete element of a solution 
(meaning the neighborhood size is probabilistic), whereas RMHC will 
only stochastically modify one element. 
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2.5 Iterated Local Search 

Iterated Local Search, ILS. 

2.5.1 Taxonomy 

Iterated Local Search is a Metaheuristic and a Global Optimization 
technique. It is an extension of Mutli-Restart Search and may be consid- 
ered a parent of many two-phase search approaches such as the Greedy 
Randomized Adaptive Search Procedure (Section 2.8) and Variable 
Neighborhood Search (Section 2.7). 

2.5.2 Strategy 

The objective of Iterated Local Search is to improve upon stochastic 
Mutli-Restart Search by sampling in the broader neighborhood of can- 
didate solutions and using a Local Search technique to refine solutions 
to their local optima. Iterated Local Search explores a sequence of 
solutions created as perturbations of the current best solution, the result 
of which is refined using an embedded heuristic. 

2.5.3 Procedure 

Algorithm 2.5.1 provides a pseudocode listing of the Iterated Local 
Search algorithm for minimizing a cost function. 



Algorithm 2.5.1: Pseudocode for Iterated Local Search. 
Input: 

Output: S best 

1 Sbest ConstructlnitialSolutionO ; 

2 Sbest LocalSearchO ; 

3 SearchHistory <— Sbest', 

4 while -i StopConditionO do 

s Scandidate <- Perturbation ( S bes t , SearchHistory); 

6 S can didate *~ LocalSear Ch (Scandidate ) ] 

7 SearchHistory «- S 'candidate; 

8 if Accept anceCr iter ion (.Sb es t? Scandidate ? 

SearchHistory) 

then 

9 | Sbest ^ Scandidate, 

io end 
n end 

12 return Sbest', 
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2.5.4 Heuristics 

• Iterated Local Search was designed for and has been predominately 
applied to discrete domains, such as combinatorial optimization 
problems. 

• The perturbation of the current best solution should be in a 
neighborhood beyond the reach of the embedded heuristic and 
should not be easily undone. 

• Perturbations that are too small make the algorithm too greedy, 
perturbations that are too large make the algorithm too stochastic. 

• The embedded heuristic is most commonly a problem-specific local 
search technique. 

• The starting point for the search may be a randomly constructed 
candidate solution, or constructed using a problem-specific heuris- 
tic (such as nearest neighbor). 

• Perturbations can be made deterministically, although stochastic 
and probabilistic (adaptive based on history) are the most common. 

• The procedure may store as much or as little history as needed to 
be used during perturbation and acceptance criteria. No history 
represents a random walk in a larger neighborhood of the best 
solution and is the most common implementation of the approach. 

• The simplest and most common acceptance criteria is an improve- 
ment in the cost of constructed candidate solutions. 



2.5.5 Code Listing 

Listing 2.4 provides an example of the Iterated Local Search algorithm 
implemented in the Ruby Programming Language. The algorithm is 
applied to the Berlin52 instance of the Traveling Salesman Problem 
(TSP), taken from the TSPLIB. The problem seeks a permutation of 
the order to visit cities (called a tour) that minimizes the total distance 
traveled. The optimal tour distance for Berlin52 instance is 7542 units. 

The Iterated Local Search runs for a fixed number of iterations. The 
implementation is based on a common algorithm configuration for the 
TSP, where a 'double-bridge move' (4-opt) is used as the perturbation 
technique, and a stochastic 2-opt is used as the embedded Local Search 
heuristic. The double-bridge move involves partitioning a permutation 
into 4 pieces (a,b,c,d) and putting it back together in a specific and 
jumbled ordering (a,d,c,b). 
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def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) .round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array .new(cities . size){ I i I i} 
perm. each_index do |i| 

r = rand(perm. size-i) + i 

perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt (permutation) 
perm = Array .new (permutation) 
cl, c2 = rand(perm. size) , rand (perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 
exclude << ( (cl==perm. size-1) ? 0 : cl+1) 
c2 = rand(perm. size) while exclude . include? (c2) 
cl, c2 = c2, cl if c2 < cl 
perm [cl . . . c2] = perm [cl . . . c2] . reverse 
return perm 
end 

def local_search(best, cities, max_no_improv) 
count = 0 
begin 

candidate = { : vector=>stochastic_two_opt (best [: vector] ) } 
candidate [: cost] = cost (candidate [: vector] , cities) 
count = (candidate [: cost] < best [: cost]) ? 0 : count+1 
best = candidate if candidate [: cost] < best [: cost] 

end until count >= max_no_improv 

return best 
end 

def double_bridge_move (perm) 

posl = 1 + rand(perm. size / 4) 

pos2 = posl + 1 + rand (perm. size / 4) 

pos3 = pos2 + 1 + rand(perm. size / 4) 

pi = perm [0. . .posl] + perm [pos3 . .perm. size] 

p2 = perm[pos2. . .pos3] + perm[posl . . .pos2] 

return pi + p2 
end 

def perturbation(cities , best) 
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candidate = O 

candidate [: vector] = double_bridge_move (best [: vector] ) 
candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate 
end 

def search(cities , max_ iterations , max_no_improv) 
best = {} 

best [tvector] = random_permutation(cities) 
best [: cost] = cost (best [: vector] , cities) 
best = local_search(best , cities, max_no_improv) 
max_iterations . times do I iter I 

candidate = perturbation(cities , best) 

candidate = local_search(candidate , cities, max_no_improv) 
best = candidate if candidate [: cost] < best [: cost] 
puts " > iteration #{(iter+l)}, best=#{best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 , 575] , [25 , 185] , [345 , 750] , [945 , 685] , [845 , 655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iterations = 100 
max_no_improv = 50 

# execute the algorithm 

best = search (berlin52, max_iterations , max_no_improv) 
puts "Done. Best Solution: c=#{best [ : cost] } , 
v=#{best [: vector] .inspect}" 

end 

Listing 2.4: Iterated Local Search in Ruby 



2.5.6 References 
Primary Sources 

The definition and framework for Iterated Local Search was described 
by Stiitzle in his PhD dissertation [12]. Specifically he proposed con- 
strains on what constitutes an Iterated Local Search algorithm as 1) 
a single chain of candidate solutions, and 2) the method used to im- 
prove candidate solutions occurs within a reduced space by a black-box 
heuristic. Stiitzle does not take credit for the approach, instead high- 
lighting specific instances of Iterated Local Search from the literature, 
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such as 'iterated descent' [1], 'large-step Markov chains' [7], 'iterated 
Lin-Kernighan' [3], 'chained local optimization' [6], as well as [2] that 
introduces the principle, and [4] that summarized it (list taken from 
[8]). 

Learn More 

Two early technical reports by Stiitzle that present applications of 
Iterated Local Search include a report on the Quadratic Assignment 
Problem [10], and another on the permutation flow shop problem [9], 
Stiitzle and Hoos also published an early paper studying Iterated Local 
Search for to the TSP [11]. Lourenco, Martin, and Stiitzle provide 
a concise presentation of the technique, related techniques and the 
framework, much as it is presented in Stiitzle's dissertation [5]. The 
same author's also preset an authoritative summary of the approach 
and its applications as a book chapter [8] . 
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2.6 Guided Local Search 

Guided Local Search, GLS. 

2.6.1 Taxonomy 

The Guided Local Search algorithm is a Metaheuristic and a Global 
Optimization algorithm that makes use of an embedded Local Search 
algorithm. It is an extension to Local Search algorithms such as Hill 
Climbing (Section 2.4) and is similar in strategy to the Tabu Search 
algorithm (Section 2.10) and the Iterated Local Search algorithm (Sec- 
tion 2.5). 

2.6.2 Strategy 

The strategy for the Guided Local Search algorithm is to use penalties to 
encourage a Local Search technique to escape local optima and discover 
the global optima. A Local Search algorithm is run until it gets stuck 
in a local optima. The features from the local optima are evaluated 
and penalized, the results of which are used in an augmented cost 
function employed by the Local Search procedure. The Local Search 
is repeated a number of times using the last local optima discovered 
and the augmented cost function that guides exploration away from 
solutions with features present in discovered local optima. 

2.6.3 Procedure 

Algorithm 2.6.1 provides a pseudocode listing of the Guided Local Search 
algorithm for minimization. The Local Search algorithm used by the 
Guided Local Search algorithm uses an augmented cost function in the 
form h(s) = g(s) + A-^ i=1 ft, where h(s) is the augmented cost function, 
g(s) is the problem cost function, A is the 'regularization parameter' (a 
coefficient for scaling the penalties), s is a locally optimal solution of 
M features, and fi is the i'th feature in locally optimal solution. The 
augmented cost function is only used by the local search procedure, the 
Guided Local Search algorithm uses the problem specific cost function 
without augmentation. 

Penalties are only updated for those features in a locally optimal 
solution that maximize utility, updated by adding 1 to the penalty 

for the future (a counter). The utility for a feature is calculated as 

c 

V feature = ; where U feature is the utility for penalizing a 

feature (maximizing), C 'feature is the cost of the feature, and P feature 
is the current penalty for the feature. 
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Algorithm 2.6.1: Pseudocode for Guided Local Search. 
Input: Iter max , A 
Output: S best 

1 f penalties ^ 0j 

2 Sbest 4— RandomSolutionO ; 

3 foreach Iteri E Iter max do 

4 S curr i LocalSearch (S'fjest ; A, /penalties)] 

5 futilities CalculateFeatureUtilities (S^u,,,., /penalties); 

6 /penalties UpdateFeaturePenalties (S curr , /penalties, 

/utilities ) 5 

7 if Cost (Scrr) < Cost(Sbest) then 

8 I Sbest ^ S curr , 

9 end 

10 end 

11 return S be st; 



2.6.4 Heuristics 

• The Guided Local Search procedure is independent of the Local 
Search procedure embedded within it. A suitable domain-specific 
search procedure should be identified and employed. 

• The Guided Local Search procedure may need to be executed for 
thousands to hundreds-of-thousands of iterations, each iteration of 
which assumes a run of a Local Search algorithm to convergence. 

• The algorithm was designed for discrete optimization problems 
where a solution is comprised of independently assessable 'features' 
such as Combinatorial Optimization, although it has been applied 
to continuous function optimization modeled as binary strings. 

• The A parameter is a scaling factor for feature penalization that 
must be in the same proportion to the candidate solution costs 
from the specific problem instance to which the algorithm is being 
applied. As such, the value for A must be meaningful when used 
within the augmented cost function (such as when it is added to 
a candidate solution cost in minimization and subtracted from a 
cost in the case of a maximization problem). 

2.6.5 Code Listing 

Listing 2.5 provides an example of the Guided Local Search algorithm 
implemented in the Ruby Programming Language. The algorithm is 
applied to the Berlin52 instance of the Traveling Salesman Problem 
(TSP), taken from the TSPLIB. The problem seeks a permutation of 



52 



Chapter 2. Stochastic Algorithms 



the order to visit cities (called a tour) that minimizes the total distance 
traveled. The optimal tour distance for Berlin52 instance is 7542 units. 

The implementation of the algorithm for the TSP was based on the 
configuration specified by Voudouris in [7] . A TSP-specific local search 
algorithm is used called 2-opt that selects two points in a permutation 
and reconnects the tour, potentially untwisting the tour at the selected 
points. The stopping condition for 2-opt was configured to be a fixed 
number of non-improving moves. 

The equation for setting A for TSP instances is A = a ■ cost ( 0 P^ lma ) ^ 
where N is the number of cities, cost(optima) is the cost of a local 
optimum found by a local search, and a € (0, 1] (around 0.3 for TSP 
and 2-opt). The cost of a local optima was fixed to the approximated 
value of f 5000 for the Berlin52 instance. The utility function for features 
(edges) in the TSP is U e d ge = ^p'' , where U e d ge is the utility for 
penalizing an edge (maximizing) , D e d ge is the cost of the edge (distance 
between cities) and P e d ge is the current penalty for the edge. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def random_permutation(cities) 

perm = Array . new(cities . size){ I i I i} 
perm. each_index do |i| 

r = rand(perm. size-i) + i 

perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt (permutation) 
perm = Array .new (permutation) 
cl, c2 = rand(perm. size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm.size-1 : cl-1) 
exclude << ( (cl==perm. size-1) ? 0 : cl+1) 
c2 = rand(perm . size) while exclude . include? (c2) 
cl, c2 = c2, cl if c2 < cl 
perm [cl . . . c2] = perm [cl ... c2] . reverse 
return perm 
end 

def augmented_cost (permutation, penalties, cities, lambda) 
distance, augmented = 0, 0 
permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

cl, c2 = c2, cl if c2 < cl 

d = euc_2d(cities [cl] , cities [c2] ) 

distance += d 

augmented += d + (lambda * (penalties [cl] [c2] ) ) 
end 

return [distance, augmented] 
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end 

def cost(cand, penalties, cities, lambda) 

cost, acost = augmented_cost(cand[:vector] , penalties, cities, lambda) 

cand[:cost], cand [ : aug_cost] = cost, acost 
end 

def local_search (current, cities, penalties, max_no_improv, lambda) 
cost (current, penalties, cities, lambda) 
count = 0 
begin 

candidate = {:vector=> stochastic_two_opt (current [: vector] ) } 
cost (candidate , penalties, cities, lambda) 

count = (candidate [: aug_cost] < current [: aug_cost] ) ? 0 : count+1 
current = candidate if candidate [: aug_cost] < current [: aug_cost] 

end until count >= max_no_improv 

return current 
end 

def calculate_f eature_utilities (penal , cities, permutation) 
utilities = Array .new (permutation. size ,0) 
permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

cl, c2 = c2, cl if c2 < cl 

utilities[i] = euc_2d(cities [cl] , cities[c2]) / (1.0 + 
penal [cl] [c2] ) 

end 

return utilities 
end 

def update_penalties ! (penalties, cities, permutation, utilities) 
max = utilities .max() 
permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

cl, c2 = c2, cl if c2 < cl 

penalties [cl] [c2] += 1 if utilities [i] == max 
end 

return penalties 
end 

def search (max_iterations , cities, max_no_improv, lambda) 
current = {:vector=>random_permutation(cities)} 
best = nil 

penalties = Array .new(cities . size){ Array .new(cities . size , 0) } 
max_iterations .times do I iter I 

current=local_search(current, cities, penalties, max_no_improv, 
lambda) 

utilities=calculate_f eature_utilities (penalties , cities , current [ : vect' 
update_penalties ! (penalties, cities, current [: vector] , utilities) 
best = current if best. nil? or current [: cost] < best [: cost] 
puts " > iter=#{(iter+l)}, best=#{best [ : cost] } , 
aug=#{best [ : aug_cost] }" 

end 

return best 
end 
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if __file__ == $o 

# problem configuration 

berlin52 = [ [565 ,575] , [25 , 185] , [345 ,750] , [945 ,685] , [845 ,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] . 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] . 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iterations = 150 
max_no_improv = 20 
alpha =0.3 

local_search_optima = 12000.0 

lambda = alpha * (local_search_optima/berlin52 . size . to_f ) 

# execute the algorithm 

best = search (max_iterations , berlin52, max_no_improv, lambda) 
puts "Done. Best Solution: c=#{best [ : cost] } , 
v=#{best [: vector] .inspect}" 

end 



Listing 2.5: Guided Local Search in Ruby 



2.6.6 References 
Primary Sources 

Guided Local Search emerged from an approach called GENET, which 
is a connectionist approach to constraint satisfaction [6, 13]. Guided 
Local Search was presented by Voudouris and Tsang in a series of 
technical reports (that were later published) that described the technique 
and provided example applications of it to constraint satisfaction [8], 
combinatorial optimization [5, 10], and function optimization [9], The 
seminal work on the technique was Voudouris' PhD dissertation [7]. 

Learn More 

Voudouris and Tsang provide a high-level introduction to the technique 
[11], and a contemporary summary of the approach in Glover and 
Kochenberger's 'Handbook of metaheuristics' [12] that includes a review 
of the technique, application areas, and demonstration applications on a 
diverse set of problem instances. Mills et al. elaborated on the approach, 
devising an 'Extended Guided Local Search' (EGLS) technique that 
added 'aspiration criteria' and random moves to the procedure [4], work 
which culminated in Mills' PhD dissertation [3]. Lau and Tsang further 
extended the approach by integrating it with a Genetic Algorithm, called 
the 'Guided Genetic Algorithm' (GGA) [2], that also culminated in a 
PhD dissertation by Lau [1]. 
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2.7 Variable Neighborhood Search 

Variable Neighborhood Search, VNS. 

2.7.1 Taxonomy 

Variable Neighborhood Search is a Metaheuristic and a Global Opti- 
mization technique that manages a Local Search technique. It is related 
to the Iterative Local Search algorithm (Section 2.5). 

2.7.2 Strategy 

The strategy for the Variable Neighborhood Search involves iterative 
exploration of larger and larger neighborhoods for a given local optima 
until an improvement is located after which time the search across 
expanding neighborhoods is repeated. The strategy is motivated by 
three principles: 1) a local minimum for one neighborhood structure 
may not be a local minimum for a different neighborhood structure, 
2) a global minimum is a local minimum for all possible neighborhood 
structures, and 3) local minima are relatively close to global minima for 
many problem classes. 

2.7.3 Procedure 

Algorithm 2.7.1 provides a pseudocode listing of the Variable Neighbor- 
hood Search algorithm for minimizing a cost function. The Pseudocode 
shows that the systematic search of expanding neighborhoods for a local 
optimum is abandoned when a global improvement is achieved (shown 
with the Break jump). 

2.7.4 Heuristics 

• Approximation methods (such as stochastic hill climbing) are 
suggested for use as the Local Search procedure for large problem 
instances in order to reduce the running time. 

• Variable Neighborhood Search has been applied to a very wide 
array of combinatorial optimization problems as well as clustering 
and continuous function optimization problems. 

• The embedded Local Search technique should be specialized to 
the problem type and instance to which the technique is being 
applied. 

• The Variable Neighborhood Descent (VND) can be embedded in 
the Variable Neighborhood Search as a the Local Search procedure 
and has been shown to be most effective. 
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Algorithm 2.7.1: Pseudocode for VNS. 



Input: Neighborhoods 
Output: S best 

1 Sbest RandomSolutionO ; 

2 while -i StopConditionO do 

3 foreach Neighborhood € Neighborhoods do 

4 Neighborhood curr <— CalculateNeighborhood(5'f )est , 
N eighborhoodi) ; 

Scandidate ^ 

RandomSolutionlnNeighborhood (Veig/i&or/iood curr ) ; 

6 Scandidate 4~ LocalSearch (Scandidate) ! 

7 if CostCS'canrfirfate) < Cost (S^) then 

8 Sbest ^ S candidates 

9 Break; 
i o end 
n end 

12 end 

13 return S bes t; 



2.7.5 Code Listing 

Listing 2.6 provides an example of the Variable Neighborhood Search 
algorithm implemented in the Ruby Programming Language. The al- 
gorithm is applied to the Berlin52 instance of the Traveling Salesman 
Problem (TSP), taken from the TSPLIB. The problem seeks a permuta- 
tion of the order to visit cities (called a tour) that minimizes the total 
distance traveled. The optimal tour distance for Berlin52 instance is 
7542 units. 

The Variable Neighborhood Search uses a stochastic 2-opt procedure 
as the embedded local search. The procedure deletes two edges and 
reverses the sequence in-between the deleted edges, potentially removing 
'twists' in the tour. The neighborhood structure used in the search is 
the number of times the 2-opt procedure is performed on a permutation, 
between 1 and 20 times. The stopping condition for the local search 
procedure is a maximum number of iterations without improvement. 
The same stop condition is employed by the higher-order Variable 
Neighborhood Search procedure, although with a lower boundary on 
the number of non-improving iterations. 



def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) .round 
end 

def cost (perm, cities) 
distance =0 
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perm. each_with_index do |cl, i| 

c2 = (i==perm. size-1) ? perm[0] : perm[i+l] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array .new(cities . size) { I i I i} 
perm. each_index do |i| 

r = rand(perm. size-i) + i 

perm[r], perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt ! (perm) 

cl, c2 = rand (perm. size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 
exclude << ( (cl==perm. size-1) ? 0 : cl+1) 
c2 = rand(perm. size) while exclude . include? (c2) 
cl, c2 = c2, cl if c2 < cl 
perm [cl . . . c2] = perm [cl ... c2] . reverse 
return perm 
end 

def local_search(best, cities, max_no_improv, neighborhood) 
count = 0 
begin 

candidate = {} 

candidate [: vector] = Array .new(best [: vector] ) 
neighborhood. times{stochastic_two_opt ! (candidate [: vector] )} 
candidate [: cost] = cost (candidate [: vector] , cities) 
if candidate [: cost] < best [: cost] 

count, best = 0, candidate 
else 

count += 1 
end 

end until count >= max_no_improv 
return best 
end 

def search(cities , neighborhoods, max_no_improv, max_no_improv_ls) 
best = {} 

best [: vector] = random_permutation(cities) 
best [: cost] = cost (best [: vector] , cities) 
iter, count = 0, 0 
begin 

neighborhoods . each do I neigh I 
candidate = {} 

candidate [: vector] = Array. new(best [: vector] ) 
neigh. times{stochastic_two_opt ! (candidate [ : vector] ) } 
candidate [: cost] = cost (candidate [: vector] , cities) 
candidate = local_search(candidate , cities, max_no_improv_ls , 
neigh) 
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puts " > iteration #{(iter+l)}, neigh=#{neigh} , 

best=#{best [ : cost] } " 
iter += 1 

if (candidate [: cost] < best [: cost]) 
best, count = candidate, 0 

puts "New best, restarting neighborhood search." 

break 
else 

count += 1 
end 
end 

end until count >= max_no_improv 
return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 ,575] , [25 , 185] , [345 ,750] , [945 ,685] , [845 ,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_no_improv = 50 
max_no_improv_ls = 70 
neighborhoods = 1...20 

# execute the algorithm 

best = search(berlin52 , neighborhoods, max_no_improv, 

max_no_improv_ls) 
puts "Done. Best Solution: c=#{best [ : cost] } , 

v=#{best [: vector] .inspect}" 

end 



Listing 2.6: Variable Neighborhood Search in Ruby 



2.7.6 References 
Primary Sources 

The seminal paper for describing Variable Neighborhood Search was 
by Mladenovic and Hansen in 1997 [7], although an early abstract by 
Mladenovic is sometimes cited [6] . The approach is explained in terms of 
three different variations on the general theme. Variable Neighborhood 
Descent (VND) refers to the use of a Local Search procedure and the 
deterministic (as opposed to stochastic or probabilistic) change of neigh- 
borhood size. Reduced Variable Neighborhood Search (RVNS) involves 
performing a stochastic random search within a neighborhood and no 
refinement via a local search technique. Basic Variable Neighborhood 
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Search is the canonical approach described by Mladenovic and Hansen 
in the seminal paper. 

Learn More 

There are a large number of papers published on Variable Neighborhood 
Search, its applications and variations. Hansen and Mladenovic provide 
an overview of the approach that includes its recent history, extensions 
and a detailed review of the numerous areas of application [4]. For 
some additional useful overviews of the technique, its principles, and 
applications, see [1-3]. 

There are many extensions to Variable Neighborhood Search. Some 
popular examples include: Variable Neighborhood Decomposition Search 
(VNDS) that involves embedding a second heuristic or metaheuristic 
approach in VNS to replace the Local Search procedure [5], Skewed 
Variable Neighborhood Search (SVNS) that encourages exploration of 
neighborhoods far away from discovered local optima, and Parallel Vari- 
able Neighborhood Search (PVNS) that either parallelizes the local 
search of a neighborhood or parallelizes the searching of the neighbor- 
hoods themselves. 
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2.8 Greedy Randomized Adaptive Search 

Greedy Randomized Adaptive Search Procedure, GRASP. 

2.8.1 Taxonomy 

The Greedy Randomized Adaptive Search Procedure is a Metaheuristic 
and Global Optimization algorithm, originally proposed for the Opera- 
tions Research practitioners. The iterative application of an embedded 
Local Search technique relate the approach to Rerative Local Search 
(Section 2.5) and Multi-Start techniques. 

2.8.2 Strategy 

The objective of the Greedy Randomized Adaptive Search Procedure is 
to repeatedly sample stochastically greedy solutions, and then use a local 
search procedure to refine them to a local optima. The strategy of the 
procedure is centered on the stochastic and greedy step- wise construction 
mechanism that constrains the selection and order-of-inclusion of the 
components of a solution based on the value they are expected to provide. 

2.8.3 Procedure 

Algorithm 2.8.1 provides a pseudocode listing of the Greedy Randomized 
Adaptive Search Procedure for minimizing a cost function. 



Algorithm 2.8.1: Pseudocode for the GRASP. 





Input: a 




Output: S best 


1 


S 


best 4— ConstructRandomSolutionO ; 


2 


while -i StopConditionO do 


3 




Scandidate 4— GreedyRandomizedConstruct ion (a) ; 


4 




Scandidate *~ LocalSear ch ( S can didate ) ! 


5 




if Cost (Scandidate) < Cost (.Sbest) then 


6 




| Sbest ^ S C andidatei 


7 




end 


8 


end 


9 


return S best ; 



Algorithm 2.8.2 provides the pseudocode the Greedy Randomized 
Construction function. The function involves the step-wise construction 
of a candidate solution using a stochastically greedy construction process. 
The function works by building a Restricted Candidate List (RCL) that 
constraints the components of a solution (features) that may be selected 
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from each cycle. The RCL may be constrained by an explicit size, or by 
using a threshold (a G [0, 1]) on the cost of adding each feature to the 
current candidate solution. 

Algorithm 2.8.2: Pseudocode the GreedyRandomized- 

Construction function. 

Input: a 

Output: S C andidate 

1 S candidate ^ 0j 

2 while S candidate ^ ProblemSize do 

3 Feature costs <- 0; 

4 for Featurei Scandidate do 

5 Feature costs <- 

CostOf AddingFeatureToSolution(,S' c(m( 2i( : iate; Feature^) ; 

6 end 

r RCL <r- 0; 

8 Fcost m i n i— MinCost (Feature cos t s ) ; 

9 Fcost max <— MaxCost (Feature costs ) 
10 for Ficost £ Feature costs do 

n if Ficost < Fcostmin + ct ■ (Fcost max — Fcost m in) then 

12 | RCL <— Feature^ 

13 end 

14 end 

15 S can didate *~ SelectRandomFeature ( RCL) ; 

16 end 

17 return S can didate) 



2.8.4 Heuristics 

• The a threshold defines the amount of greediness of the construc- 
tion mechanism, where values close to 0 may be too greedy, and 
values close to 1 may be too generalized. 

• As an alternative to using the a threshold, the RCL can be 
constrained to the top n% of candidate features that may be 
selected from each construction cycle. 

• The technique was designed for discrete problem classes such as 
combinatorial optimization problems. 

2.8.5 Code Listing 

Listing 2.7 provides an example of the Greedy Randomized Adaptive 
Search Procedure implemented in the Ruby Programming Language. 
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The algorithm is applied to the Berlin52 instance of the Traveling 
Salesman Problem (TSP), taken from the TSPLIB. The problem seeks a 
permutation of the order to visit cities (called a tour) that minimizes the 
total distance traveled. The optimal tour distance for Berlin52 instance 
is 7542 units. 

The stochastic and greedy step-wise construction of a tour involves 
evaluating candidate cities by the the cost they contribute as being the 
next city in the tour. The algorithm uses a stochastic 2-opt procedure 
for the Local Search with a fixed number of non-improving iterations as 
the stopping condition. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (perm, cities) 
distance =0 

perm. each_with_index do |cl, i| 

c2 = (i==perm. size-1) ? perm[0] : perm[i+l] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def stochastic_two_opt (permutation) 
perm = Array .new (permutation) 
cl, c2 = rand (perm. size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 
exclude << ( (cl==perm. size-1) ? 0 : cl+1) 
c2 = rand(perm. size) while exclude . include? (c2) 
cl, c2 = c2, cl if c2 < cl 
perm[cl . . . c2] = perm [cl ... c2] . reverse 
return perm 
end 

def local_search(best, cities, max_no_improv) 
count = 0 
begin 

candidate = { : vector=>stochastic_two_opt (best [: vector] ) } 

candidate [: cost] = cost (candidate [: vector] , cities) 

count = (candidate [: cost] < best [: cost]) ? 0 : count+1 

best = candidate if candidate [: cost] < best [: cost] 
end until count >= max_no_improv 
return best 
end 

def construct_randomized_greedy_solution(cities , alpha) 
candidate = {} 

candidate [: vector] = [rand(cities . size)] 
allCities = Array .new(cities . size) {|i| i} 
while candidate [: vector] . size < cities. size 

candidates = allCities - candidate [: vector] 

costs = Array .new (candidates . size) do |i| 
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euc_2d( cities [candidate [: vector] . last] , cities [i] ) 
end 

rcl, max, min = [] , costs. max, costs. min 
costs . each_with_index do |c,i| 

rcl << candidates [i] if c <= (min + alpha* (max-min) ) 
end 

candidate [: vector] << rcl [rand(rcl . size) ] 
end 

candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate 
end 

def search(cities , max_iter, max_no_improv, alpha) 
best = nil 

max_iter .times do I iter I 

candidate = construct_randomized_greedy_solution(cities , alpha); 
candidate = local_search (candidate, cities, max_no_improv) 
best = candidate if best. nil? or candidate [: cost] < best [: cost] 
puts " > iteration #{(iter+l)}, best=#{best [ : cost] }" 

end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 , 575] , [25 , 185] , [345 ,750] , [945 ,685] , [845 ,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iter = 50 
max_no_improv = 50 
greediness_f actor = 0.3 

# execute the algorithm 

best = search(berlin52 , max_iter, max_no_improv, greediness_f actor) 
puts "Done. Best Solution: c=#{best [ : cost] } , 
v=#{best [: vector] .inspect}" 

end 

Listing 2.7: Greedy Randomized Adaptive Search Procedure in Ruby 

2.8.6 References 
Primary Sources 

The seminal paper that introduces the general approach of stochastic 
and greedy step-wise construction of candidate solutions is by Feo and 
Resende [3]. The general approach was inspired by greedy heuristics by 
Hart and Shogan [9] . The seminal review paper that is cited with the 
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preliminary paper is by Fco and Rcscndc [4]. and provides a coherent 
description of the GRASP technique, an example, and review of early 
applications. An early application was by Feo, Venkatraman and Bard 
for a machine scheduling problem [7]. Other early applications to 
scheduling problems include technical reports [2] (later published as [1]) 
and [5] (also later published as [6]). 

Learn More 

There are a vast number of review, application, and extension papers 
for GRASP. Pitsoulis and Resende provide an extensive contemporary 
overview of the field as a review chapter [11], as does Resende and 
Ribeiro that includes a clear presentation of the use of the a threshold 
parameter instead of a fixed size for the RCL [13]. Festa and Resende 
provide an annotated bibliography as a review chapter that provides 
some needed insight into large amount of study that has gone into the 
approach [8]. There are numerous extensions to GRASP, not limited 
to the popular Reactive GRASP for adapting a [12], the use of long 
term memory to allow the technique to learn from candidate solutions 
discovered in previous iterations, and parallel implementations of the 
procedure such as 'Parallel GRASP' [10]. 
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2.9 Scatter Search 

Scatter Search, SS. 

2.9.1 Taxonomy 

Scatter search is a Metaheuristic and a Global Optimization algorithm. It 
is also sometimes associated with the field of Evolutionary Computation 
given the use of a population and recombination in the structure of the 
technique. Scatter Search is a sibling of Tabu Search (Section 2.10), 
developed by the same author and based on similar origins. 

2.9.2 Strategy 

The objective of Scatter Search is to maintain a set of diverse and high- 
quality candidate solutions. The principle of the approach is that useful 
information about the global optima is stored in a diverse and elite set of 
solutions (the reference set) and that recombining samples from the set 
can exploit this information. The strategy involves an iterative process, 
where a population of diverse and high-quality candidate solutions that 
are partitioned into subsets and linearly recombined to create weighted 
centroids of sample-based neighborhoods. The results of recombination 
are refined using an embedded heuristic and assessed in the context of 
the reference set as to whether or not they are retained. 

2.9.3 Procedure 

Algorithm 2.9.1 provides a pseudocode listing of the Scatter Search 
algorithm for minimizing a cost function. The procedure is based on the 
abstract form presented by Glover as a template for the general class of 
technique [3] , with influences from an application of the technique to 
function optimization by Glover [3] . 

2.9.4 Heuristics 

• Scatter search is suitable for both discrete domains such as com- 
binatorial optimization as well as continuous domains such as 
non- linear programming (continuous function optimization). 

• Small set sizes are preferred for the Ref erenceSet, such as 10 or 
20 members. 

• Subset sizes can be 2, 3, 4 or more members that are all recombined 
to produce viable candidate solutions within the neighborhood of 
the members of the subset. 
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Algorithm 2.9.1: Pseudocode for Scatter Search. 
Input: DiverseSet s i ze , ReferenceSet S i ze 
Output: ReferenceSet 

1 InitialSet <— ConstructInitialSolution(_DwerseS'et S i ze ) ; 

2 RefinedSet «- 0; 

3 for Si £ InitialSet do 

4 | RefinedSet <— LocalSearch(S' i ) ; 

5 end 

6 ReferenceSet <- 

SelectlnitialRef erenceSet (ReferenceSet S i ze ); 

7 while -i StopConditionO do 

8 Subsets SelectSubset (ReferenceSet); 

9 CandidateSet <— 0; 

10 for Subseti <G Subsets do 

n RecombinedCandidates <— RecombineMembers (Subseti) ; 

12 for Si € RecombinedCandidates do 

13 j CandidateSet LocalSearch(S' i ) ; 

14 end 

15 end 

16 ReferenceSet <— Select (ReferenceSet, CandidateSet, 

ReferenceSet S i ze ) ; 

17 end 

is return ReferenceSet; 



• Each subset should comprise at least one member added to the 
set in the previous algorithm iteration. 

• The Local Search procedure should be a problem-specific improve- 
ment heuristic. 



The selection of members for the ReferenceSet at the end of 
each iteration favors solutions with higher quality and may also 
promote diversity. 

The ReferenceSet may be updated at the end of an iteration, or 
dynamically as candidates are created (a so-called steady-state 
population in some evolutionary computation literature). 

A lack of changes to the ReferenceSet may be used as a signal 
to stop the current search, and potentially restart the search with 
a newly initialized ReferenceSet. 
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2.9.5 Code Listing 

Listing 2.8 provides an example of the Scatter Search algorithm imple- 
mented in the Ruby Programming Language. The example problem is 
an instance of a continuous function optimization that seeks min /(a;) 
where / = X)™=i x i> — 5-0 < X{ < 5.0 and n — 3. The optimal solution 
for this basin function is (yi, . . . , v n ) = 0.0. 

The algorithm is an implementation of Scatter Search as described in 
an application of the technique to unconstrained non-linear optimization 
by Glover [6]. The seeds for initial solutions are generated as random 
vectors, as opposed to stratified samples. The example was further 
simplified by not including a restart strategy, and the exclusion of 
diversity maintenance in the Ref erenceSet. A stochastic local search 
algorithm is used as the embedded heuristic that uses a stochastic step 
size in the range of half a percent of the search space. 

def objective_f unction(vector) 

return vector . inject (0) {|sum, x| sum + (x ** 2.0)} 
end 

def rand_in_bounds(min, max) 

return min + ((max-min) * randO) 
end 

def random_vector (minmax) 

return Array .new(minmax . size) do |i| 

rand_in_bounds (minmax [i] [0] , minmax [i] [1]) 

end 
end 

def take_step (minmax, current, step_size) 
position = Array .new(current . size) 
position. size .times do |i| 

min = [minmax [i] [0] , current [i] -step_size] .max 

max = [minmax [i] [1] , current [i] +step_size] .min 

position [i] = rand_in_bounds(min, max) 
end 

return position 
end 

def local_search(best , bounds, max_no_improv, step_size) 
count = 0 
begin 

candidate = { :vector=>take_step (bounds , best [: vector] , step_size)} 
candidate [: cost] = objective_f unction(candidate [: vector] ) 
count = (candidate [: cost] < best [: cost]) ? 0 : count+1 
best = candidate if candidate [: cost] < best [: cost] 

end until count >= max_no_improv 

return best 
end 

def construct_initial_set (bounds , set_size, max_no_improv, step_size) 
diverse_set = [] 
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begin 

cand = {:vector=>random_vector (bounds)} 
cand[:cost] = objective_f unction(cand [: vector] ) 
cand = local_search(cand, bounds, max_no_improv, step_size) 
diverse_set « cand if ! diverse_set . any? {|x| 
x [ : vector] ==cand [ : vector] } 
end until diverse_set . size == set_size 
return diverse_set 
end 

def euclidean_distance(cl, c2) 
sum =0.0 

cl.each_index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Math . sqrt ( sum) 
end 

def distance (v, set) 

return set . inject (0){ I s ,x I s + euclidean_distance(v, x [: vector] )} 
end 

def diversify (diverse_set , num_elite, ref _set_size) 
diverse_set . sort ! { I x,y I x[:cost] <=> y[:cost]} 
ref_set = Array .new(num_elite) { I i I diverse_set [i] } 
remainder = diverse_set - ref_set 

remainder . each{ I c I c[:dist] = distance (c [: vector] , ref_set)} 
remainder . sort ! { I x , y I y [ : dist] <=>x [ : dist] } 

ref_set = ref_set + remainder . first (ref _set_size-ref _set . size) 
return [ref_set, ref_set[0]] 
end 

def select_subsets(ref _set) 

additions = ref _set . select{ I c I c[:new]} 
remainder = ref_set - additions 

remainder = additions if remainder .nil? or remainder . empty? 
subsets = [] 
additions . each do |a| 

remainder . each{ I r I subsets << [a,r] if a!=r kk 
! subsets . include? ( [r , a] ) } 

end 

return subsets 
end 

def recombine (subset , minmax) 
a, b = subset 

d = Array. new (a [: vector] .size) { I i I (b [ : vector] [i] -a [ : vector] [i] ) /2 . 0} 
children = [] 
subset. each do |p| 

direction, r = ((rand<0.5) ? +1.0 : -1.0), rand 
child = { : vector=>Array .new(minmax . size) } 
child [: vector] . each_index do |i| 

child [: vector] [i] = p [: vector] [i] + (direction * r * d[i]) 
child [: vector] [i] =minmax [i] [0] if child [: vector] [i] <minmax [i] [0] 
child [: vector] [i] =minmax [i] [1] if child [: vector] [i] >minmax [i] [1] 
end 

child [: cost] = objective_function(child[: vector] ) 
children << child 
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end 

return children 
end 

def explore_subsets (bounds , ref_set, max_no_improv, step_size) 
was _ change = false 
subsets = select_subsets(ref _set) 
ref _set . each{ I c I c[:new] = false} 
subsets. each do I subset I 

candidates = recombine (subset , bounds) 
improved = Array . new(candidates . size) do |i| 

local_search(candidates [i] , bounds, max_no_improv, step_size) 
end 

improved. each do |c| 

if ! ref _set . any? {|x| x [: vector] ==c [: vector] } 
c [ : new] = true 

ref _set . sort ! { I x,y I x[:cost] <=> y[:cost]} 
if c[:cost] < ref _set . last [: cost] 

ref _set . delete (ref _set . last) 

ref_set << c 

puts " » added, cost=#{c [ : cost] }" 
was _ change = true 
end 
end 
end 
end 

return was_change 
end 

def search (bounds , max_iter, ref _set_size , div_set_size , max_no_improv, 
step_size, max_elite) 
diverse_set = construct_initial_set (bounds , div_set_size, 

max_no_improv, step_size) 
ref_set, best = diversify (diverse_set , max_elite, ref _set_size) 
ref _set . each{ I c I c[:new] = true} 
max_iter .times do literl 

was_change = explore_subsets (bounds , ref_set, max_no_improv, 
step_size) 

ref _set . sort ! { I x,y I x[:cost] <=> y[:cost]} 
best = ref _set . first if ref _set . first [: cost] < best [: cost] 
puts " > iter=#{(iter+l)}, best=#{best [ : cost] }" 
break if !was_change 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

bounds = Array .new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_iter = 100 

step_size = (bounds [0] [1] -bounds [0] [0] ) *0 . 005 
max_no_improv = 30 
ref_set_size = 10 
diverse_set_size = 20 
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no_elite = 5 

# execute the algorithm 

best = search (bounds , max_iter, ref _set_size , diverse_set_size , 

max_no_improv, step_size, no_elite) 
puts "Done. Best Solution: c=#{best [ : cost] } , 

v=#{best [: vector] .inspect}" 

end 



Listing 2.8: Scatter Search in Ruby 



2.9.6 References 
Primary Sources 

A form of the Scatter Search algorithm was proposed by Glover for 
integer programming [1], based on Glover's earlier work on surrogate 
constraints. The approach remained idle until it was revisited by Glover 
and combined with Tabu Search [2]. The modern canonical reference of 
the approach was proposed by Glover who provides an abstract template 
of the procedure that may be specialized for a given application domain 
[3]. 

Learn More 

The primary reference for the approach is the book by Laguna and Marti 
that reviews the principles of the approach in detail and presents tutorials 
on applications of the approach on standard problems using the C 
programming language [7] . There are many review articles and chapters 
on Scatter Search that may be used to supplement an understanding of 
the approach, such as a detailed review chapter by Glover [4] , a review of 
the fundamentals of the approach and its relationship to an abstraction 
called 'path linking' by Glover, Laguna, and Marti [5], and a modern 
overview of the technique by Marti, Laguna, and Glover [8] . 

2.9.7 Bibliography 

[1] F. Glover. Heuristics for integer programming using surrogate con- 
straints. Decision Sciences, 8(1):156-166, 1977. 

[2] F. Glover. Tabu search for nonlinear and parametric optimization 
(with links to genetic algorithms). Discrete Applied Mathematics, 
49:231-255, 1994. 

[3] F. Glover. Artificial Evolution, chapter A Template For Scatter 
Search And Path Relinking, page 13. Sprinter, 1998. 

[4] F. Glover. New Ideas in Optimization, chapter Scatter search and 
path relinking, pages 297-316. McGraw-Hill Ltd., 1999. 
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[6] F. Glover, M. Laguna, and R. Marti. Advances in Evolutionary 
Computation: Theory and Applications, chapter Scatter Search, 
pages 519-537. Springer- Verlag, 2003. 
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mentations in C. Kluwer Academic Publishers, 2003. 
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2.10 Tabu Search 

Tabu Search, TS, Taboo Search. 

2.10.1 Taxonomy 

Tabu Search is a Global Optimization algorithm and a Metaheuristic or 
Meta-strategy for controlling an embedded heuristic technique. Tabu 
Search is a parent for a large family of derivative approaches that 
introduce memory structures in Metaheuristics, such as Reactive Tabu 
Search (Section 2.11) and Parallel Tabu Search. 

2.10.2 Strategy 

The objective for the Tabu Search algorithm is to constrain an embedded 
heuristic from returning to recently visited areas of the search space, 
referred to as cycling. The strategy of the approach is to maintain a 
short term memory of the specific changes of recent moves within the 
search space and preventing future moves from undoing those changes. 
Additional intermediate-term memory structures may be introduced 
to bias moves toward promising areas of the search space, as well as 
longer-term memory structures that promote a general diversity in the 
search across the search space. 

2.10.3 Procedure 

Algorithm 2.10.1 provides a pseudocode listing of the Tabu Search 
algorithm for minimizing a cost function. The listing shows the simple 
Tabu Search algorithm with short term memory, without intermediate 
and long term memory management. 

2.10.4 Heuristics 

• Tabu search was designed to manage an embedded hill climbing 
heuristic, although may be adapted to manage any neighborhood 
exploration heuristic. 

• Tabu search was designed for, and has predominately been applied 
to discrete domains such as combinatorial optimization problems. 

• Candidates for neighboring moves can be generated determinis- 
tically for the entire neighborhood or the neighborhood can be 
stochastically sampled to a fixed size, trading off efficiency for 
accuracy. 
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Algorithm 2.10.1: Pseudocode for Tabu Search. 
Input: TabuListgize 
Output: S best 

1 Sbest ConstructlnitialSolutionO ; 

2 Tabu List «- 0; 

3 while -i StopConditionO do 

4 
5 
6 
7 



CandidateList 

for S C andidate £ Sbcst ne ighborhood do 

if -i ContainsAnyFeatures(S , CQn(M ,,;, 



. Tabu List) then 



CandidateList <— 5, 



candidate ; 



end 



J candidate ? 



9 end 

10 Seandidate *~ LocateBestCandidate (CandidateList) ; 

11 if Cost (S can didate) < Cost(Sbest) then 

12 Sbest ^ Seandidate? 

13 TabuList -s— FeatureDif f erences (5 

14 while TabuList > TabuList S i ze do 

15 DeleteFeature (TabuList); 

16 end 

17 end 
is end 

19 return S bes t; 



SbesD '■ 



• Intermediate-term memory structures can be introduced (comple- 
menting the short-term memory) to focus the search on promising 
areas of the search space (intensification) , called aspiration criteria. 

• Long-term memory structures can be introduced (complement- 
ing the short-term memory) to encourage useful exploration of 
the broader search space, called diversification. Strategies may 
include generating solutions with rarely used components and bi- 
asing the generation away from the most commonly used solution 
components. 

2.10.5 Code Listing 

Listing 2.9 provides an example of the Tabu Search algorithm imple- 
mented in the Ruby Programming Language. The algorithm is applied 
to the Berlin52 instance of the Traveling Salesman Problem (TSP), 
taken from the TSPLIB. The problem seeks a permutation of the order 
to visit cities (called a tour) that minimizes the total distance traveled. 
The optimal tour distance for Berli52 instance is 7542 units. 

The algorithm is an implementation of the simple Tabu Search 
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with a short term memory structure that executes for a fixed number 
of iterations. The starting point for the search is prepared using a 
random permutation that is refined using a stochastic 2-opt Local Search 
procedure. The stochastic 2-opt procedure is used as the embedded hill 
climbing heuristic with a fixed sized candidate list. The two edges that 
are deleted in each 2-opt move are stored on the tabu list. This general 
approach is similar to that used by Knox in his work on Tabu Search 
for symmetrical TSP [12] and Fiechter for the Parallel Tabu Search for 
the TSP [2]. 



def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) .round 
end 

def cost (perm, cities) 
distance = 0 

perm. each_with_index do I c 1 , i| 

c2 = (i==perm. size-1) ? perm[0] : perm[i+l] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array . new(cities . size){ I i I i} 
perm. each_index do |i| 

r = rand(perm. size-i) + i 

perm[r] , perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt (parent) 
perm = Array .new (parent) 

cl, c2 = rand(perm. size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 

exclude << ( (cl==perm. size-1) ? 0 : cl+1) 

c2 = rand(perm . size) while exclude . include? (c2) 

cl, c2 = c2, cl if c2 < cl 

perm [cl . . . c2] = perm [cl ... c2] . reverse 

return perm, [ [parent [cl-1] , parent[cl]], [parent [c2-l] , parent[c2]]] 
end 

def is_tabu? (permutation, tabu_list) 
permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 
tabu_list . each do I f orbidden_edge I 

return true if f orbidden_edge == [cl, c2] 
end 
end 

return false 
end 
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def generate_candidate (best , tabu_list, cities) 
perm, edges = nil, nil 
begin 

perm, edges = stochastic_two_opt (best [: vector] ) 
end while is_tabu? (perm, tabu_list) 
candidate = { : vector=>perm} 

candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate, edges 
end 

def search(cities , tabu_list_size , candidate_list_size , max_iter) 
current = {:vector=>random_permutation(cities)> 
current [: cost] = cost (current [: vector] , cities) 
best = current 

tabu_list = Array .new(tabu_list_size) 
max_iter .times do literl 

candidates = Array .new(candidate_list_size) do |i| 
generate_candidate(current , tabu_list, cities) 

end 

candidates . sort ! {|x,y| x .first [: cost] <=> y. first [: cost] } 
best_candidate = candidates . first [0] 
best_candidate_edges = candidates . first [1] 
if best_candidate [ : cost] < current [: cost] 
current = best_candidate 

best = best_candidate if best_candidate [ : cost] < best [: cost] 
best_candidate_edges.each {ledge I tabu_list .push (edge) } 
tabu_list .pop while tabu_list . size > tabu_list_size 
end 

puts " > iteration #{(iter+l)}, best=#{best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [[565,575] , [25, 185] , [345,750] , [945,685] , [845,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830 , 610] , [605 , 625] , [595 , 360] , [1340 , 725] , [1740 , 245] ] 

# algorithm configuration 
max_iter = 100 
tabu_list_size = 15 
max_candidates = 50 

# execute the algorithm 

best = search (berlin52, tabu_list_size , max_candidates , max_iter) 
puts "Done. Best Solution: c=#{best [ : cost] } , 
v=#{best [: vector] .inspect}" 

end 



Listing 2.9: Tabu Search in Ruby 
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2.10.6 References 
Primary Sources 

Tabu Search was introduced by Glover applied to scheduling employees 
to duty rosters [9] and a more general overview in the context of the 
TSP [5], based on his previous work on surrogate constraints on integer 
programming problems [4]. Glover provided a seminal overview of the 
algorithm in a two-part journal article, the first part of which introduced 
the algorithm and reviewed then-recent applications [6], and the second 
which focused on advanced topics and open areas of research [7] . 

Learn More 

Glover provides a high-level introduction to Tabu Search in the form 
of a practical tutorial [8], as does Glover and Taillard in a user guide 
format [10]. The best source of information for Tabu Search is the 
book dedicated to the approach by Glover and Laguna that covers the 
principles of the technique in detail as well as an in-depth review of 
applications [11]. The approach appeared in Science, that considered 
a modification for its application to continuous function optimization 
problems [1]. Finally, Gendreau provides an excellent contemporary 
review of the algorithm, highlighting best practices and application 
heuristics collected from across the field of study [3]. 

2.10.7 Bibliography 
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salesman problems. Discrete Applied Mathematics, 3(6):243-267, 
1994. 
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tion to Tabu Search, pages 37-54. Springer, 2003. 
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2.11 Reactive Tabu Search 

Reactive Tabu Search, RTS, R-TABU, Reactive Taboo Search. 

2.11.1 Taxonomy 

Reactive Tabu Search is a Metaheuristic and a Global Optimization 
algorithm. It is an extension of Tabu Search (Section 2.10) and the 
basis for a field of reactive techniques called Reactive Local Search and 
more broadly the field of Reactive Search Optimization. 

2.11.2 Strategy 

The objective of Tabu Search is to avoid cycles while applying a local 
search technique. The Reactive Tabu Search addresses this objective 
by explicitly monitoring the search and reacting to the occurrence of 
cycles and their repetition by adapting the tabu tenure (tabu list size). 
The strategy of the broader field of Reactive Search Optimization is 
to automate the process by which a practitioner configures a search 
procedure by monitoring its online behavior and to use machine learning 
techniques to adapt a techniques configuration. 

2.11.3 Procedure 

Algorithm 2.11.1 provides a pseudocode listing of the Reactive Tabu 
Search algorithm for minimizing a cost function. The Pseudocode is 
based on the version of the Reactive Tabu Search described by Battiti 
and Tecchiolli in [9] with supplements like the IsTabu function from [7]. 
The procedure has been modified for brevity to exude the diversification 
procedure (escape move). Algorithm 2.11.2 describes the memory based 
reaction that manipulates the size of the ProhibitionPeriod in response 
to identified cycles in the ongoing search. Algorithm 2.11.3 describes 
the selection of the best move from a list of candidate moves in the 
neighborhood of a given solution. The function permits prohibited moves 
in the case where a prohibited move is better than the best know solution 
and the selected admissible move (called aspiration). Algorithm 2.11.4 
determines whether a given neighborhood move is tabu based on the 
current ProhibitionPeriod, and is employed by sub-functions of the 
Algorithm 2.11.3 function. 

2.11.4 Heuristics 

• Reactive Tabu Search is an extension of Tabu Search and as such 
should exploit the best practices used for the parent algorithm. 
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Algorithm 2.11.1: Pseudocode for Reactive Tabu Search. 
Input: Iteration max , Increase, Decrease, ProblemSize 
Output: S best 

1 S curr <s— ConstructlnitialSolutionO ; 

2 Sbest i S curr , 

3 Tabu List <r- 0; 

4 ProhibitionPeriod <— 1; 

5 foreach Iteration^ € lteration max do 

6 ' 
7 
8 
9 

10 
11 
12 

13 end 

14 return S best ; 



MemoryBasedReaction( Increase, Decrease, ProblemSize); 
CandidateList <— GenerateCandidateNeighborhood(S' curr ) ; 
Scurr BestMove (CandidateList); 

Tabu List «- Scurr feature] 

if Cost (Scurr) < Cost (Sbest) then 

! Sbest 4 Seurr'i 

end 



Algorithm 2.11.2: Pseudocode for the MemoryBasedReaction 
function. 

Input: Increase, Decrease, ProblemSize 

Output: 

1 if HaveVisitedSolutionBef ore (S curr , VisitedSolutions) then 

2 Scurr t RetrieveLastTimeVisited(VisitedSolutions, 

Scurr ) 5 

3 Repetitionlnterval <— Iteration — Scurrt; 

4 Scurr t <— Iteratiorii; 

5 if Repetitionlnterval < 2 x ProblemSize then 

6 Repetitionlntervalavg 0.1 x Repetitionlnterval + 0.9 x 
RepetitionI nter val avg ; 

7 ProhibitionPeriod <— ProhibitionPeriod x Increase; 

8 ProhibitionPeriodt Iteratiorii; 

9 end 
10 else 

n VisitedSolutions <— S CU rr; 

12 Scurr t <— Iteratiorii; 

13 end 

14 if Iteratiorii — ProhibitionPeriodt > RepetitionI nter val avg 
then 

15 ProhibitionPeriod <— Max(i, ProhibitionPeriod x Decrease); 

16 ProhibitionPeriodt Iteration^ 

17 end 
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Algorithm 2.11.3: Pseudocode for the BestMove function. 
Input: ProblemSize 

Output: S curr 

1 CandidateListadmissible <— GetAdmissibleMoves(CandidateList); 

2 CandidateListtabu <— CandidateList — CandidateListadmissible', 

3 if Size (CandidateListadmissMe) < 2 then 

4 ProhibitionPeriod <— ProblemSize — 2; 

5 ProhibitionPeriodt 4— Iteration^, 

6 end 

7 S curr 4— GetBest CCandidateList al i m is S ibie) j 

8 Sbesttabu GetBest (CandidateListtabu) 

9 if Cost (S&esitafcu) < Cost (S bes t) A Cost (Sbest ta bu) < 
Cost(S curr ) then 

10 | S curr ^ Sbesttabu] 

11 end 

12 return S curr ; 



Algorithm 2.11.4: Pseudocode for the IsTabu function. 
Input: 

Output: Tabu 

1 Tabu <- FALSE; 

2 Scurr t f eature <— RetrieveTimeFeatureLastUsed(S'curr/ eatttre ) ; 

3 if Scurr t j eature > Iteration CU rr — ProhibitionPeriod then 

4 | Tabu <- TRUE; 

5 end 

6 return Tabu; 



• Reactive Tabu Search was designed for discrete domains such as 
combinatorial optimization, although has been applied to continu- 
ous function optimization. 

• Reactive Tabu Search was proposed to use efficient memory data 
structures such as hash tables. 

• Reactive Tabu Search was proposed to use an long-term memory 
to diversify the search after a threshold of cycle repetitions has 
been reached. 

• The increase parameter should be greater than one (such as 1.1 
or 1.3) and the decrease parameter should be less than one (such 
as 0.9 or 0.8). 
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2.11.5 Code Listing 

Listing 2.10 provides an example of the Reactive Tabu Search algorithm 
implemented in the Ruby Programming Language. The algorithm is 
applied to the Berlin52 instance of the Traveling Salesman Problem 
(TSP), taken from the TSPLIB. The problem seeks a permutation of 
the order to visit cities (called a tour) that minimizes the total distance 
traveled. The optimal tour distance for Berlin52 instance is 7542 units. 

The procedure is based on the code listing described by Battiti and 
Tecchiolli in [9] with supplements like the IsTabu function from [7]. The 
implementation does not use efficient memory data structures such as 
hash tables. The algorithm is initialized with a stochastic 2-opt local 
search, and the neighborhood is generated as a fixed candidate list of 
stochastic 2-opt moves. The edges selected for changing in the 2-opt 
move are stored as features in the tabu list. The example does not 
implement the escape procedure for search diversification. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (perm, cities) 
distance = 0 

perm.each_with_index do |cl, i| 

c2 = (i==perm. size-1) ? perm[0] : perm[i+l] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array .newCcities . size) { I i I i} 
perm. each_index do |i| 

r = rand (perm . size-i) + i 

perm[r], perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt (parent) 
perm = Array .new (parent) 

cl, c2 = rand (perm. size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 

exclude << ( (cl==perm. size-1) ? 0 : cl+1) 

c2 = rand(perm. size) while exclude . include? (c2) 

cl, c2 = c2, cl if c2 < cl 

perm [cl . . . c2] = perm [cl ... c2] . reverse 

return perm, [ [parent [cl-1] , parent[cl]], [parent [c2-l] , parent[c2]]] 
end 

def is_tabu? (edge , tabu_list, iter, prohib_period) 
tabu_list . each do I entry I 
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if entry [ : edge] == edge 
return true if entry [: iter] >= iter-prohib_period 
return false 
end 
end 

return false 
end 

def make_tabu(tabu_list , edge, iter) 
tabu_list.each do I entry I 
if entry [: edge] == edge 
entry [: iter] = iter 
return entry 
end 
end 

entry = { : edge=>edge , :iter=>iter} 
tabu_list .push(entry) 
return entry 
end 

def to_edge_list (perm) 
list = [] 

perm. each_with_ index do |cl, i| 

c2 = (i==perm. size-1) ? perm[0] : perm[i+l] 

cl, c2 = c2, cl if cl > c2 

list « [cl, c2] 
end 

return list 
end 

def equivalent? (ell , el2) 

ell. each {|e| return false if ! el2 . include? (e) } 

return true 
end 

def generate_candidate (best , cities) 
candidate = {} 

candidate [: vector] , edges = stochastic_two_opt (best [: vector] ) 
candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate, edges 
end 

def get_candidate_entry (visited_list , permutation) 
edgeList = to_edge_list (permutation) 
visited_list.each do I entry I 

return entry if equivalent? (edgeList , entry [: edgelist] ) 
end 

return nil 
end 

def store_permutation(visited_list , permutation, iteration) 
entry = {} 

entry [: edgelist] = to_edge_list (permutation) 
entry [: iter] = iteration 
entry [: visits] = 1 
visited_list .push(entry) 
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return entry 
end 

def sort_neighborhood(candidates , tabu_list, prohib_period, iteration) 
tabu, admissable = [] , [] 
candidates . each do |a| 

if is_tabu?(a[l] [0] , tabu_list, iteration, prohib_period) or 
is_tabu?(a[l] [1] , tabu_list, iteration, prohib_period) 
tabu « a 
else 

admissable « a 
end 
end 

return [tabu, admissable] 
end 

def search(cities , max_cand, max_iter, increase, decrease) 
current = {:vector=>random_permutation(cities)} 
current [: cost] = cost (current [: vector] , cities) 
best = current 

tabu_list, prohib_period = [] , 1 

visited_list , avg_size, last_change = [] , 1, 0 

max_iter .times do literl 

candidate_entry = get_candidate_entry(visited_list, 

current [: vector] ) 
if ! candidate_entry . ni 1? 
repetition_interval = iter - candidate_entry [ : iter] 
candidate_entry [: iter] = iter 
candidate_entry[: visits] += 1 
if repetition_interval < 2* (cities . size-1) 

avg_size = 0 . 1* (iter-candidate_entry [ : iter] ) + 0.9*avg_size 
prohib_period = (prohib_period.to_f * increase) 
last_change = iter 
end 
else 

store_permutation(visited_list, current [: vector] , iter) 
end 

if iter-last_change > avg_size 

prohib_period = [prohib_period*decrease , 1] .max 

last_change = iter 
end 

candidates = Array .new (max_cand) do |i| 

generate_candidate(current , cities) 
end 

candidates . sort ! {|x,y| x .first [: cost] <=> y. first [: cost] } 
tabu,admis = 

sort .neighborhood (candidates, t abu_list ,prohib_period, iter) 
if admis.size < 2 

prohib_period = cities . size-2 

last_change = iter 
end 

current ,best_move_edges = (admis . empty?) ? tabu. first : admis. first 
if ! tabu. empty? 
tf = tabu.f irst [0] 

if tf [: cost] <best [: cost] and tf [: cost] <current [: cost] 
current, best_move_edges = tabu. first 



88 



Chapter 2. Stochastic Algorithms 



end 
end 

best_move_edges . each {ledge I make_tabu(tabu_list, edge, iter)} 
best = candidates . first [0] if candidates . first [0] [: cost] < 
best [ : cost] 

puts " > it=#{iter}, tenure=#{prohib_period. round}, 
best=#{best [ : cost] }" 

end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 ,575] , [25 , 185] , [345 ,750] , [945 ,685] , [845 ,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iter = 100 

max _ candidates = 50 
increase = 1.3 
decrease = 0.9 

# execute the algorithm 

best = search(berlin52 , max_candidates , max_iter, increase, decrease) 
puts "Done. Best Solution: c=#{best [ : cost] } , 
v=#{best [: vector] .inspect}" 

end 



Listing 2.10: Reactive Tabu Search in Ruby 



2.11.6 References 
Primary Sources 

Reactive Tabu Search was proposed by Battiti and Tecchiolli as an 
extension to Tabu Search that included an adaptive tabu list size in 
addition to a diversification mechanism [7]. The technique also used 
efficient memory structures that were based on an earlier work by Battiti 
and Tecchiolli that considered a parallel tabu search [6]. Some early 
application papers by Battiti and Tecchiolli include a comparison to 
Simulated Annealing applied to the Quadratic Assignment Problem [8], 
benchmarked on instances of the knapsack problem and N-K models and 
compared with Repeated Local Minima Search, Simulated Annealing, 
and Genetic Algorithms [9], and training neural networks on an array 
of problem instances [10]. 
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Learn More 

Reactive Tabu Search was abstracted to a form called Reactive Local 
Search that considers adaptive methods that learn suitable parameters 
for heuristics that manage an embedded local search technique [4, 5]. 
Under this abstraction, the Reactive Tabu Search algorithm is a single 
example of the Reactive Local Search principle applied to the Tabu 
Search. This framework was further extended to the use of any adaptive 
machine learning techniques to adapt the parameters of an algorithm by 
reacting to algorithm outcomes online while solving a problem, called 
Reactive Search [1] . The best reference for this general framework is the 
book on Reactive Search Optimization by Battiti, Brunato, and Mascia 
[3]. Additionally, the review chapter by Battiti and Brunato provides a 
contemporary description [2]. 
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Evolutionary Algorithms 



3.1 Overview 

This chapter describes Evolutionary Algorithms. 

3.1.1 Evolution 

Evolutionary Algorithms belong to the Evolutionary Computation field 
of study concerned with computational methods inspired by the process 
and mechanisms of biological evolution. The process of evolution by 
means of natural selection (descent with modification) was proposed by 
Darwin to account for the variety of life and its suitability (adaptive 
fit) for its environment. The mechanisms of evolution describe how 
evolution actually takes place through the modification and propagation 
of genetic material (proteins). Evolutionary Algorithms are concerned 
with investigating computational systems that resemble simplified ver- 
sions of the processes and mechanisms of evolution toward achieving 
the effects of these processes and mechanisms, namely the development 
of adaptive systems. Additional subject areas that fall within the realm 
of Evolutionary Computation are algorithms that seek to exploit the 
properties from the related fields of Population Genetics, Population 
Ecology, Coevolutionary Biology, and Developmental Biology. 

3.1.2 References 

Evolutionary Algorithms share properties of adaptation through an 
iterative process that accumulates and amplifies beneficial variation 
through trial and error. Candidate solutions represent members of a 
virtual population striving to survive in an environment defined by 
a problem specific objective function. In each case, the evolutionary 
process refines the adaptive fit of the population of candidate solutions 
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in the environment, typically using surrogates for the mechanisms of 
evolution such as genetic recombination and mutation. 

There are many excellent texts on the theory of evolution, although 
Darwin's original source can be an interesting and surprisingly enjoyable 
read [5]. Huxley's book defined the modern synthesis in evolutionary 
biology that combined Darwin's natural selection with Mendel's genetic 
mechanisms [25], although any good textbook on evolution will suffice 
(such as Futuyma's "Evolution" [13]). Popular science books on evolution 
are an easy place to start, such as Dawkins' "The Selfish Gene" that 
presents a gene-centric perspective on evolution [6], and Dennett's 
"Darwin's Dangerous Idea" that considers the algorithmic properties of 
the process [8]. 

Goldberg's classic text is still a valuable resource for the Genetic 
Algorithm [14], and Holland's text is interesting for those looking to 
learn about the research into adaptive systems that became the Ge- 
netic Algorithm [23]. Additionally, the seminal work by Koza should 
be considered for those interested in Genetic Programming [30], and 
Schwefel's seminal work should be considered for those with an interest 
in Evolution Strategies [34]. For an in-depth review of the history of 
research into the use of simulated evolutionary processed for problem 
solving, see Fogel [12] For a rounded and modern review of the field 
of Evolutionary Computation, Back, Fogel, and Michalewicz's two vol- 
umes of "Evolutionary Computation" are an excellent resource covering 
the major techniques, theory, and application specific concerns [2, 3]. 
For some additional modern books on the unified field of Evolutionary 
Computation and Evolutionary Algorithms, see De Jong [26], a recent 
edition of Fogel [11], and Eiben and Smith [9]. 

3.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Evolutionary Computation, not limited to: 

• Distributed Evolutionary Computation: that are designed 
to partition a population across computer networks or computa- 
tional units such as the Distributed or 'Island Population' Genetic 
Algorithm [4, 35] and Diffusion Genetic Algorithms (also known 
as Cellular Genetic Algorithms) [1]. 

• Niching Genetic Algorithms: that form groups or sub-populations 
automatically within a population such as the Deterministic Crowd- 
ing Genetic Algorithm [31, 32], Restricted Tournament Selection 
[20, 21], and Fitness Sharing Genetic Algorithm [7, 19]. 

• Evolutionary Multiple Objective Optimization Algorithms: 

such as Vector- Evaluated Genetic Algorithm (VEGA) [33], Pareto 
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Archived Evolution Strategy (PAES) [28, 29], and the Niched 
Pareto Genetic Algorithm (NPGA) [24]. 

• Classical Techniques: such as GENITOR [36], and the CHC 
Genetic Algorithm [10]. 

• Competent Genetic Algorithms: (so-called [15]) such as the 
Messy Genetic Algorithm [17, 18], Fast Messy Genetic Algorithm 
[16], Gene Expression Messy Genetic Algorithm [27], and the 
Linkage-Learning Genetic Algorithm [22]. 
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3.2 Genetic Algorithm 

Genetic Algorithm, GA, Simple Genetic Algorithm, SGA, Canonical 
Genetic Algorithm, CGA. 

3.2.1 Taxonomy 

The Genetic Algorithm is an Adaptive Strategy and a Global Optimiza- 
tion technique. It is an Evolutionary Algorithm and belongs to the 
broader study of Evolutionary Computation. The Genetic Algorithm is 
a sibling of other Evolutionary Algorithms such as Genetic Programming 
(Section 3.3), Evolution Strategies (Section 3.4), Evolutionary Program- 
ming (Section 3.6), and Learning Classifier Systems (Section 3.9). The 
Genetic Algorithm is a parent of a large number of variant techniques 
and sub-fields too numerous to list. 

3.2.2 Inspiration 

The Genetic Algorithm is inspired by population genetics (including 
heredity and gene frequencies), and evolution at the population level, 
as well as the Mendelian understanding of the structure (such as chro- 
mosomes, genes, alleles) and mechanisms (such as recombination and 
mutation) . This is the so-called new or modern synthesis of evolutionary 
biology. 

3.2.3 Metaphor 

Individuals of a population contribute their genetic material (called the 
genotype) proportional to their suitability of their expressed genome 
(called their phenotype) to their environment, in the form of offspring. 
The next generation is created through a process of mating that involves 
recombination of two individuals genomes in the population with the 
introduction of random copying errors (called mutation). This iterative 
process may result in an improved adaptive-fit between the phenotypes 
of individuals in a population and the environment. 

3.2.4 Strategy 

The objective of the Genetic Algorithm is to maximize the payoff of 
candidate solutions in the population against a cost function from the 
problem domain. The strategy for the Genetic Algorithm is to repeatedly 
employ surrogates for the recombination and mutation genetic mecha- 
nisms on the population of candidate solutions, where the cost function 
(also known as objective or fitness function) applied to a decoded repre- 
sentation of a candidate governs the probabilistic contributions a given 
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candidate solution can make to the subsequent generation of candidate 
solutions. 



3.2.5 Procedure 

Algorithm 3.2.1 provides a pseudocode listing of the Genetic Algorithm 
for minimizing a cost function. 

Algorithm 3.2.1: Pseudocode for the Genetic Algorithm. 

Input. Populdt / lOTl s -i Z e: PTObleiTl s i ze ^ Pcrossover: ^mutation 

Output: S best 

1 Population <— InitializePopulation(PopuZcrfion s j ze , 
Problem slze ); 

2 EvaluatePopulation(Population) ; 

3 Sb es t 4— GetBestSolution(Population) ; 

4 while ^StopConditionO do 



Parents <— SelectParents (Population, Population S i ze ); 
Children <- 0; 

foreach Parenti, Parent2 € Parents do 

Childi, Child,2 Crossover (Parenti, Parent2, 



crossover J i 



9 Children <— Mutate (Childi, Pmutation)] 

io Children <— Mutate (Child,2, Pmutation)] 

n end 

12 EvaluatePopulation(Children) ; 

13 Sbest GetBestSolution(Children) ; 

14 Population <— Replace (Population, Children); 

15 end 

16 return S be su 



3.2.6 Heuristics 

• Binary strings (referred to as 'bitstrings') are the classical represen- 
tation as they can be decoded to almost any desired representation. 
Real-valued and integer variables can be decoded using the binary 
coded decimal method, one's or two's complement methods, or 
the gray code method, the latter of which is generally preferred. 

• Problem specific representations and customized genetic operators 
should be adopted, incorporating as much prior information about 
the problem domain as possible. 

• The size of the population must be large enough to provide 
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sufficient coverage of the domain and mixing of the useful sub- 
components of the solution [7] . 

• The Genetic Algorithm is classically configured with a high prob- 
ability of recombination (such as 95%-99% of the selected popula- 
tion) and a low probability of mutation (such as \ where L is the 
number of components in a solution) [1, 18]. 

• The fitness-proportionate selection of candidate solutions to con- 
tribute to the next generation should be neither too greedy (to 
avoid the takeover of fitter candidate solutions) nor too random. 

3.2.7 Code Listing 

Listing 3.1 provides an example of the Genetic Algorithm implemented 
in the Ruby Programming Language. The demonstration problem is 
a maximizing binary optimization problem called OneMax that seeks 
a binary string of unity (all '1' bits). The objective function provides 
only an indication of the number of correct bits in a candidate string, 
not the positions of the correct bits. 

The Genetic Algorithm is implemented with a conservative configu- 
ration including binary tournament selection for the selection operator, 
one-point crossover for the recombination operator, and point mutations 
for the mutation operator. 

def onemax(bitstring) 
sum = 0 

bitstring. size. times {|i| sum+=l if bitstring [i] . chr== 1 1 ' ]■ 
return sum 
end 

def random_bitstring(num_bits) 

return (0 . . . num_bits) . inject (" " ) { I s , i I s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def binary_tournament (pop) 

i, j = rand(pop . size) , rand(pop . size) 
j = rand (pop . size) while j==i 

return (pop [i] [: fitness] > pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def point_mutation(bitstring, rate=l . 0/bitstring. size) 
child = "" 

bitstring. size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def crossover(parentl, parent2, rate) 
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return ""+parentl if rand()>=rate 
point = 1 + rand (parent 1 . size-2) 

return parentl[0. . .point] +parent2 [point . . . (parentl . size)] 
end 

def reproduce (selected, pop_size, p_cross, p_mutation) 
children = [] 

selected. each_with_index do I pi, i| 
p2 = (i.modulo(2)==0) ? selected [i+1] : selected[i-l] 
p2 = selected[0] if i == selected. size-1 
child = {} 

child [:bitstring] = crossover (pi [:bitstring] , p2 [:bitstring] , 
p_cross) 

child [:bitstring] = point_mutation(child[:bitstring] , p_mutation) 
children << child 

break if children. size >= pop_size 
end 

return children 
end 

def search (max_gens , num_bits, pop_size, p_crossover, p_mutation) 
population = Array .new(pop_size) do |i| 

{ : bit str ing=>random_bitstr ing (num_bits ) } 
end 

population . each{ I c I c[:fitness] = onemax(c [ :bitstring] ) } 

best = population. sort{|x,y| y[:fitness] <=> x [: fitness] }. first 

max_gens .times do I gen I 

selected = Array. new(pop_size){ I i I binary_tournament (population)} 
children = reproduce (selected, pop_size, p_crossover, p_mutation) 
children. each{ I c I c[:fitness] = onemax(c [ :bitstring] )} 
children. sort ! { |x,y I y[:fitness] <=> x[:fitness]} 

best = children. first if children. f irst [:f itness] >= best [: fitness] 
population = children 

puts " > gen #{gen}, best: #{best [: fitness] } , #{best [ :bitstring] }" 
break if best [:f itness] == num_bits 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 64 

# algorithm configuration 
max_gens = 100 
pop_size = 100 
p_crossover =0.98 
p_mutation = 1 . 0/num_bits 

# execute the algorithm 

best = search (max_gens , num_bits, pop_size, p_crossover, p_mutation) 
puts "done! Solution: f =#{best [: fitness] } , s=#{best [ :bitstring] }" 
end 



Listing 3.1: Genetic Algorithm in Ruby 
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3.2.8 References 
Primary Sources 

Holland is the grandfather of the field that became Genetic Algorithms. 
Holland investigated adaptive systems in the late 1960s proposing an 
adaptive system formalism and adaptive strategies referred to as 'adap- 
tive plans' [8-10]. Holland's theoretical framework was investigated 
and elaborated by his Ph.D. students at the University of Michigan. 
Rosenberg investigated a chemical and molecular model of a biological 
inspired adaptive plan [19]. Bagley investigated meta-environments and 
a genetic adaptive plan referred to as a genetic algorithm applied to 
a simple game called hexapawn [2]. Cavicchio further elaborated the 
genetic adaptive plan by proposing numerous variations, referring to 
some as 'reproductive plans' [15]. 

Other important contributions were made by Frantz who investigated 
what were referred to as genetic algorithms for search [3], and Holl- 
stien who investigated genetic plans for adaptive control and function 
optimization [12]. De Jong performed a seminal investigation of the 
genetic adaptive model (genetic plans) applied to continuous function 
optimization and his suite of test problems adopted are still commonly 
used [13]. Holland wrote the the seminal book on his research focus- 
ing on the proposed adaptive systems formalism, the reproductive and 
genetic adaptive plans, and provided a theoretical framework for the 
mechanisms used and explanation for the capabilities of what would 
become genetic algorithms [11]. 

Learn More 

The field of genetic algorithms is very large, resulting in large numbers 
of variations on the canonical technique. Goldberg provides a classical 
overview of the field in a review article [5], as does Mitchell [16]. Whitley 
describes a classical tutorial for the Genetic Algorithm covering both 
practical and theoretical concerns [20] . 

The algorithm is highly-modular and a sub-field exists to study 
each sub-process, specifically: selection, recombination, mutation, and 
representation. The Genetic Algorithm is most commonly used as an 
optimization technique, although it should also be considered a general 
adaptive strategy [14]. The schema theorem is a classical explanation 
for the power of the Genetic Algorithm proposed by Holland [11], and 
investigated by Goldberg under the name of the building block hypothesis 
[4]- 

The classical book on genetic algorithms as an optimization and 
machine learning technique was written by Goldberg and provides an in- 
depth review and practical study of the approach [4] . Mitchell provides 
a contemporary reference text introducing the technique and the field 
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[17]. Finally, Goldberg provides a modern study of the field, the lessons 
learned, and reviews the broader toolset of optimization algorithms that 
the field has produced [6]. 
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3.3 Genetic Programming 

Genetic Programming, GP. 

3.3.1 Taxonomy 

The Genetic Programming algorithm is an example of an Evolution- 
ary Algorithm and belongs to the field of Evolutionary Computation 
and more broadly Computational Intelligence and Biologically Inspired 
Computation. The Genetic Programming algorithm is a sibling to other 
Evolutionary Algorithms such as the Genetic Algorithm (Section 3.2), 
Evolution Strategies (Section 3.4), Evolutionary Programming (Sec- 
tion 3.6), and Learning Classifier Systems (Section 3.9). Technically, 
the Genetic Programming algorithm is an extension of the Genetic 
Algorithm. The Genetic Algorithm is a parent to a host of variations 
and extensions. 

3.3.2 Inspiration 

The Genetic Programming algorithm is inspired by population genetics 
(including heredity and gene frequencies), and evolution at the popu- 
lation level, as well as the Mendelian understanding of the structure 
(such as chromosomes, genes, alleles) and mechanisms (such as recombi- 
nation and mutation). This is the so-called new or modern synthesis of 
evolutionary biology. 

3.3.3 Metaphor 

Individuals of a population contribute their genetic material (called the 
genotype) proportional to their suitability of their expressed genome 
(called their phenotype) to their environment. The next generation is 
created through a process of mating that involves genetic operators such 
as recombination of two individuals genomes in the population and the 
introduction of random copying errors (called mutation). This iterative 
process may result in an improved adaptive-fit between the phenotypes 
of individuals in a population and the environment. 

Programs may be evolved and used in a secondary adaptive process, 
where an assessment of candidates at the end of that secondary adaptive 
process is used for differential reproductive success in the first evolution- 
ary process. This system may be understood as the inter-dependencies 
experienced in evolutionary development where evolution operates upon 
an embryo that in turn develops into an individual in an environment 
that eventually may reproduce. 
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3.3.4 Strategy 

The objective of the Genetic Programming algorithm is to use induction 
to devise a computer program. This is achieved by using evolutionary 
operators on candidate programs with a tree structure to improve the 
adaptive fit between the population of candidate programs and an 
objective function. An assessment of a candidate solution involves its 
execution. 

3.3.5 Procedure 

Algorithm 3.3.1 provides a pseudocode listing of the Genetic Program- 
ming algorithm for minimizing a cost function, based on Koza and Poli's 
tutorial [9]. 

The Genetic Program uses LISP-like symbolic expressions called 
S-expressions that represent the graph of a program with function nodes 
and terminal nodes. While the algorithm is running, the programs are 
treated like data, and when they are evaluated they are executed. The 
traversal of a program graph is always depth first, and functions must 
always return a value. 

3.3.6 Heuristics 

• The Genetic Programming algorithm was designed for inductive 
automatic programming and is well suited to symbolic regression, 
controller design, and machine learning tasks under the broader 
name of function approximation. 

• Traditionally Lisp symbolic expressions are evolved and evaluated 
in a virtual machine, although the approach has been applied with 
compiled programming languages. 

• The evaluation (fitness assignment) of a candidate solution typi- 
cally takes the structure of the program into account, rewarding 
parsimony. 

• The selection process should be balanced between random selection 
and greedy selection to bias the search towards fitter candidate 
solutions (exploitation), whilst promoting useful diversity into the 
population (exploration) . 

• A program may respond to zero or more input values and may 
produce one or more outputs. 

• All functions used in the function node set must return a usable 
result. For example, the division function must return a sensible 
value (such as zero or one) when a division by zero occurs. 
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Algorithm 3.3.1: Pseudocode for Genetic Programming. 
Input: Population size, nodes f unc , nodes term , P crosS over, 

Pmutation, Pr epr oduction , ^alteration 

Output: S best 

1 Population <— InitializePopulation(PopuZat?on size , 
nodes f unc , nodes ter m); 

2 EvaluatePopulation(Population) ; 

a Sb es t GetBestSolution(Population); 
4 while ^StopConditionO do 
Children <- 0; 

while Size (Children) < Population size do 

Operator <— SelectGeneticOperator (Pcrossover, 

Pmutation, Pr epr oduction: P alter ation) , 

if Operator = CrossoverOperator then 

Parenti, Parent 2 <— SelectParents (Population, 

Population S i ze ) ; 

Childi, Child2 4- Crossover (Parenti, Parent^)] 
Children 4— Childi] 
Children <— Child 2 ] 
else if Operator = MutationOperator then 
Parenti 4- SelectParents (Population, 
Population si Z e) ; 
Childi 4- Mutate (Parenti); 
Children <— Childi] 
else if Operator = ReproductionOperator then 
Parenti 4- SelectParents (Population, 

Population size) ] 
Childi 4- Reproduce (Parenti); 
Children <— Childi] 
else if Operator = AlterationOperator then 
Parenti 4— SelectParents (Population, 
Population size) 

Childi 4- AlterArchitecture (Parenti)] 
Children <— Childi] 
end 
end 

EvaluatePopulation(Children) ; 
Sbest GetBestSolution(Children, Sbest)', 
Population <- Children; 

30 end 

31 return S b esu 
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• All genetic operations ensure (or should ensure) that syntactically 
valid and executable programs are produced as a result of their 
application. 

• The Genetic Programming algorithm is commonly configured with 
a high- probability of crossover (> 90%) and a low-probability 
of mutation (< 1%). Other operators such as reproduction and 
architecture alterations are used with moderate-level probabilities 
and fill in the probabilistic gap. 

• Architecture altering operations are not limited to the duplication 
and deletion of sub-structures of a given program. 

• The crossover genetic operator in the algorithm is commonly 
configured to select a function as a the cross-point with a high- 
probability (> 90%) and low-probability of selecting a terminal as 
a cross-point (< 10%). 

• The function set may also include control structures such as con- 
ditional statements and loop constructs. 

• The Genetic Programing algorithm can be realized as a stack-based 
virtual machine as opposed to a call graph [11]. 

• The Genetic Programming algorithm can make use of Automat- 
ically Defined Functions (ADFs) that are sub-graphs and are 
promoted to the status of functions for reuse and are co-evolved 
with the programs. 

• The genetic operators employed during reproduction in the algo- 
rithm may be considered transformation programs for candidate 
solutions and may themselves be co-evolved in the algorithm [1] . 

3.3.7 Code Listing 

Listing 3.2 provides an example of the Genetic Programming algorithm 
implemented in the Ruby Programming Language based on Koza and 
Poli's tutorial [9]. 

The demonstration problem is an instance of a symbolic regression, 
where a function must be devised to match a set of observations. In 
this case the target function is a quadratic polynomial x 2 + x + 1 where 
x G [—1, 1]. The observations are generated directly from the target 
function without noise for the purposes of this example. In practical 
problems, if one knew and had access to the target function then the 
genetic program would not be required. 

The algorithm is configured to search for a program with the function 
set {+, — , x , —} and the terminal set {X, R} : where X is the input value, 
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and R is a static random variable generated for a program X e [—5, 5]. 
A division by zero returns a value of one. The fitness of a candidate 
solution is calculated by evaluating the program on range of random 
input values and calculating the Root Mean Squared Error (RMSE). 
The algorithm is configured with a 90% probability of crossover, 8% 
probability of reproduction (copying), and a 2% probability of mutation. 
For brevity, the algorithm does not implement the architecture altering 
genetic operation and does not bias crossover points towards functions 
over terminals. 



def rand_in_bounds(min, max) 

return min + (max-min) *rand() 
end 

def print_program(node) 

return node if ! node . kind_of? (Array) 
return "(#{node[0]} #{print_program(node [1] )} 
#{print_program(node [2] )}) " 

end 

def eval_program(node, map) 
if ! node .kind_of? (Array) 

return map [node] . to_f if !map[node] .nil? 

return node . to_f 
end 

argl, arg2 = eval_program(node [1] , map), eval_program(node [2] , map) 
return 0 if node [0] === : / and arg2 == 0.0 

return argl. send (node [0] , arg2) 

end 

def generate_random_program(max , funcs, terms, depth=0) 
if depth==max-l or (depth>l and rand()<0.1) 
t = terms [rand(terms . size)] 

return ((t=='R') ? rand_in_bounds (-5 . 0 , +5.0) : t) 
end 

depth += 1 

argl = generate_random_program(max , funcs, terms, depth) 
arg2 = generate_random_program(max , funcs, terms, depth) 
return [funcs [rand (funcs . size)] , argl, arg2] 
end 

def count_nodes (node) 

return 1 if ! node . kind_of? (Array) 

al = count_nodes(node [1] ) 

a2 = count_nodes(node [2] ) 

return al+a2+l 
end 

def target_f unction (input) 

return input**2 + input + 1 
end 

def fitness (program, num_trials=20) 
sum_error = 0.0 
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num_trials . times do |i| 

input = rand_in_bounds(-1.0, 1.0) 

error = eval_program (program, { ' X ' =>input}) - target_f unction (input) 
sum_error += error. abs 
end 

return sum_error / num_trials . to_f 
end 

def tournament_selection(pop, bouts) 

selected = Array .new(bouts) {pop [rand(pop . size)] } 

selected. sort ! { I x,y I x[: fitness] <=>y [: fitness] } 

return selected. first 
end 

def replace_node (node , replacement, node_num, cur_node=0) 
return [replacement , (cur_node+l)] if cur_node == node_num 
cur_node += 1 

return [node , cur_node] if ! node. kind_of? (Array) 

al, cur_node = replace_node (node [1] , replacement, node_num, cur_node) 
a2, cur_node = replace_node (node [2] , replacement, node_num, cur_node) 
return [ [node [0] , al , a2] , cur_node] 
end 

def copy .program (node) 

return node if ! node. kind_of? (Array) 

return [node [0] , copy .program (node [ 1] ) , copy_pr ogram (node [2] ) ] 
end 

def get_node(node, node_num, current_node=0) 

return node , (current_node+l) if current_node == node_num 
current _node += 1 

return nil , current _node if (node . kind_of? (Array) 

al, current_node = get_node (node [1] , node_num, current_node) 

return al , current_node if lal.nil? 

a2, current_node = get_node (node [2] , node_num, current_node) 
return a2 , current _node if !a2.nil? 
return nil , current _node 
end 

def prune (node, max_depth, terms, depth=0) 
if depth == max_depth-l 

t = terms [rand(terms . size) ] 

return ((t=='R') ? rand_in_bounds (-5 . 0 , +5.0) : t) 
end 

depth += 1 

return node if ! node. kind_of? (Array) 
al = prune (node [1] , max_depth, terms, depth) 
a2 = prune (node [2] , max_depth, terms, depth) 
return [node [0] , al , a2] 
end 

def crossover(parentl, parent2, max_depth, terms) 
ptl, pt2 = rand(count_nodes (parentl) -2)+l , 

rand(count_nodes (parent2) -2) +1 
treel, cl = get_node (parent 1 , ptl) 
tree2, c2 = get_node(parent2, pt2) 
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childl, cl = replace_node(parentl , copy_program(tree2) , ptl) 
childl = prune(childl , max_depth, terms) 

child2, c2 = replace_node (parent2 , copy_program(treel) , pt2) 
child2 = prune (child2, max_depth, terms) 
return [childl, child2] 
end 

def mutation (parent , max_depth, functs, terms) 

random_tree = generate_random_program(max_depth/2 , functs, terms) 
point = rand(count_nodes (parent)) 

child, count = replace_node (parent , random_tree, point) 
child = prune (child, max_depth, terms) 
return child 
end 

def search (max_gens , pop_size, max_depth, bouts, p_repro, p_cross, 
p_mut, functs, terms) 
population = Array .new(pop_size) do |i| 

{ :prog=>generate_random_program(max_depth, functs, terms)} 
end 

population . each{ I c I c[:fitness] = f itness(c [:prog] )} 
best = population. sort{|x,y| x[:fitness] <=> y [: fitness] }. first 
max_gens .times do I gen I 
children = [] 

while children. size < pop_size 
operation = rand() 

pi = tournament_selection(population, bouts) 
cl = {} 

if operation < p_repro 

cl[:prog] = copy_program(pl [:prog] ) 
elsif operation < p_repro+p_cross 

p2 = tournament_selection (population, bouts) 

c2 = {} 

cl [: prog] ,c2 [: prog] = crossover (pi [: prog] , p2[:prog], max_depth, 
terms) 

children << c2 
elsif operation < p_repro+p_cross+p_mut 

cl[:prog] = mutational [:prog] , max_depth, functs, terms) 
end 

children << cl if children. size < pop_size 
end 

children. each{ I c I c[:fitness] = fitness (c [ :prog] ) } 
population = children 

population. sort !{ I x,y I x[:fitness] <=> y [: fitness] } 
best = population. first if population. first [: fitness] <= 

best [:f itness] 
puts " > gen #{gen}, fitness=#{best [ :f itness] }" 
break if best [:f itness] == 0 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
terms = ['X' , 'R'] 
functs = [:+, :-, :*, :/] 
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152 # algorithm configuration 

153 max_gens = 100 

154 max_depth = 7 

155 pop_size = 100 

156 bouts = 5 

157 p_repro = 0.08 

158 p_cross = 0.90 

159 p_mut = 0.02 

160 # execute the algorithm 

161 best = search (max_gens , pop_size, max_depth, bouts, p_repro, p_cross, 

p_mut , functs, terms) 

162 puts "done! Solution: f =#{best [: fitness] } , 

#{print_program(best [:prog] )}" 

163 end 

Listing 3.2: Genetic Programming in Ruby 



3.3.8 References 
Primary Sources 

An early work by Cramer involved the study of a Genetic Algorithm 
using an expression tree structure for representing computer programs 
for primitive mathematical operations [3]. Koza is credited with the 
development of the field of Genetic Programming. An early paper by 
Koza referred to his hierarchical genetic algorithms as an extension to the 
simple genetic algorithm that use symbolic expressions (S-expressions) as 
a representation and were applied to a range of induction-style problems 
[4]. The seminal reference for the field is Koza's 1992 book on Genetic 
Programming [5]. 

Learn More 

The field of Genetic Programming is vast, including many books, ded- 
icated conferences and thousands of publications. Koza is generally 
credited with the development and popularizing of the field, publishing 
a large number of books and papers himself. Koza provides a practical 
introduction to the field as a tutorial and provides recent overview of 
the broader field and usage of the technique [9]. 

In addition his the seminal 1992 book, Koza has released three more 
volumes in the series including volume II on Automatically Defined 
Functions (ADFs) [6], volume III that considered the Genetic Program- 
ming Problem Solver (GPPS) for automatically defining the function 
set and program structure for a given problem [7], and volume IV that 
focuses on the human competitive results the technique is able to achieve 
in a routine manner [8]. All books are rich with targeted and practical 
demonstration problem instances. 
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Some additional excellent books include a text by Banzhaf et al. that 
provides an introduction to the field [2], Langdon and Poli's detailed look 
at the technique [10], and Poli, Langdon, and McPhee's contemporary 
and practical field guide to Genetic Programming [12]. 
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3.4 Evolution Strategies 

Evolution Strategies, Evolution Strategy, Evolutionary Strategies, ES. 

3.4.1 Taxonomy 

Evolution Strategies is a global optimization algorithm and is an in- 
stance of an Evolutionary Algorithm from the field of Evolutionary 
Computation. Evolution Strategies is a sibling technique to other Evo- 
lutionary Algorithms such as Genetic Algorithms (Section 3.2), Genetic 
Programming (Section 3.3), Learning Classifier Systems (Section 3.9), 
and Evolutionary Programming (Section 3.6). A popular descendant of 
the Evolution Strategies algorithm is the Covariance Matrix Adaptation 
Evolution Strategies (CMA-ES). 

3.4.2 Inspiration 

Evolution Strategies is inspired by the theory of evolution by means of 
natural selection. Specifically, the technique is inspired by macro-level or 
the species-level process of evolution (phenotype, hereditary, variation) 
and is not concerned with the genetic mechanisms of evolution (genome, 
chromosomes, genes, alleles). 

3.4.3 Strategy 

The objective of the Evolution Strategies algorithm is to maximize 
the suitability of collection of candidate solutions in the context of an 
objective function from a domain. The objective was classically achieved 
through the adoption of dynamic variation, a surrogate for descent with 
modification, where the amount of variation was adapted dynamically 
with performance-based heuristics. Contemporary approaches co-adapt 
parameters that control the amount and bias of variation with the 
candidate solutions. 

3.4.4 Procedure 

Instances of Evolution Strategies algorithms may be concisely described 
with a custom terminology in the form (/i, A) — ES, where fi is number 
of candidate solutions in the parent generation, and A is the number 
of candidate solutions generated from the parent generation. In this 
configuration, the best /i are kept if A > \x, where A must be great or equal 
to fi. In addition to the so-called comma-selection Evolution Strategies 
algorithm, a plus-selection variation may be defined (ji + X) — ES, where 
the best members of the union of the /j, and A generations compete 
based on objective fitness for a position in the next generation. The 
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simplest configuration is the (1 + 1) — ES, which is a type of greedy hill 
climbing algorithm. Algorithm 3.4.1 provides a pseudocode listing of 
the (/i, A) — ES algorithm for minimizing a cost function. The algorithm 
shows the adaptation of candidate solutions that co-adapt their own 
strategy parameters that influence the amount of mutation applied to a 
candidate solutions descendants. 

Algorithm 3.4.1: Pseudocode for (p,X) Evolution Strategies. 
Input: /i, A, ProblemSize 
Output: S best 

1 Population <— InitializePopulation(/i. ProblemSize); 

2 EvaluatePopulation(Population) ; 

3 Sbest 4— GetBest (Population, 1); 

4 while ^StopConditionO do 

5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 



Children <- 0; 
for i = 0 to A do 

Parenti •<— GetParent (Population, i); 

# <-0; 

Siproblem ^ Mutate (.Piproblerri) strategy*) 5 
Slstrategy 4 Mutate (Plstrategy s ; 

Children S 1 .^ 
end 

EvaluatePopulation(Children) ; 
Sbest <— GetBest (Children + Sbest, 
Population ^— SelectBest (Population, Children, /i); 

16 end 

17 return S be su 



3.4.5 Heuristics 

• Evolution Strategies uses problem specific representations, such 
as real values for continuous function optimization. 

• The algorithm is commonly configured such that 1 < fi < A. 

• The ratio of /i to A influences the amount of selection pressure 
(greediness) exerted by the algorithm. 

• A contemporary update to the algorithms notation includes a p 
as {n/ p, A) — ES that specifies the number of parents that will 
contribute to each new candidate solution using a recombination 
operator. 

• A classical rule used to govern the amount of mutation (standard 
deviation used in mutation for continuous function optimization) 
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was the J-rule, where the ratio of successful mutations should 
be | of all mutations. If it is greater the variance is increased, 
otherwise if the ratio is is less, the variance is decreased. 

• The comma-selection variation of the algorithm can be good for 
dynamic problem instances given its capability for continued ex- 
ploration of the search space, whereas the plus-selection variation 
can be good for refinement and convergence. 

3.4.6 Code Listing 

Listing 3.3 provides an example of the Evolution Strategies algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization that seeks 
min/(x) where / = X)T=i x i> — 5-0 < Xi < 5.0 and n = 2. The optimal 
solution for this basin function is (vq, . . . , u n -i) = 0-0- The algorithm 
is a implementation of Evolution Strategies based on simple version 
described by Back and Schwefel [2], which was also used as the basis of 
a detailed empirical study [11]. The algorithm is an (30 + 20) — ES that 
adapts both the problem and strategy (standard deviations) variables. 
More contemporary implementations may modify the strategy variables 
differently, and include an additional set of adapted strategy parameters 
to influence the direction of mutation (see [7] for a concise description) . 

def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array .new(minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def random_gaussian(mean=0 . 0 , stdev=1.0) 
ul = u2 = w = 0 
begin 

ul = 2 * randO - 1 

u2 = 2 * randO - 1 

w = ul * ul + u2 * u2 
end while w >= 1 

w = Math. sqrt( (-2.0 * Math.log(w)) / w) 
return mean + (u2 * w) * stdev 
end 

def mutate_problem(vector , stdevs, search_space) 
child = Array (vector . size) 
vector . each_with_index do |v, i| 

child [i] = v + stdevs [i] * random_gaussian() 

child [i] = search_space [i] [0] if child [i] < search_space [i] [0] 
child [i] = search_space [i] [1] if child [i] > search_space [i] [1] 
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end 

return child 
end 

def mutate_strategy (stdevs) 

tau = Math. sqrt (2.0*stdevs .size.to_f )**-l .0 

tau_p = Math. sqrt (2 . 0*Math. sqrt (stdevs . size. to_f) ) . 0 

child = Array. new(stdevs. size) do |i| 

stdevs [i] * Math.exp(tau_p*random_gaussian() + 
tau*random_gaussian() ) 

end 

return child 
end 

def mutate (par, minmax) 
child = {} 

child [: vector] = mutate_problem (par [: vector] , par [: strategy] , minmax) 
child [: strategy] = mutate_strategy (par [: strategy] ) 
return child 
end 

def init_population (minmax , pop_size) 

strategy = Array .new(minmax . size) do |i| 

[0, (minmax [i] [1] -minmax [i] [0] ) * 0.05] 
end 

pop = Array .new(pop_size) { Hash. new } 
pop.each_index do |i| 

pop [i] [: vector] = random_vector (minmax) 

pop [i] [: strategy] = random_vector (strategy) 
end 

pop.each{|c| c[:fitness] = objective_f unction(c [: vector] )} 
return pop 
end 

def search (max_gens , search_space , pop_size, num_children) 
population = init_population(search_space , pop_size) 
best = population. sort{ I x,y I x[:fitness] <=> y [: fitness] }. first 
max_gens .times do I gen I 

children = Array .new(num_children) do |i| 

mutate (population [i] , search_space) 
end 

children. each{ I c I c[:fitness] = objective_f unction(c [: vector] )} 
union = children+population 

union. sort !{ I x,y I x[:fitness] <=> y[:fitness]} 
best = union. first if union. first [: fitness] < best [: fitness] 
population = union. first (pop_size) 
puts " > gen #{gen}, f itness=#{best [: fitness] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array .new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
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max_gens = 100 
pop_size = 30 
num_children = 20 
# execute the algorithm 

best = search (max_gens , search_space , pop_size, num_children) 
puts "done! Solution: f =#{best [: fitness] } , s=#{best [: vector] . inspect}" 
end 

Listing 3.3: Evolution Strategies in Ruby 

3.4.7 References 
Primary Sources 

Evolution Strategies was developed by three students (Bienert, Rechen- 
berg, Schwefel) at the Technical University in Berlin in 1964 in an effort 
to robotically optimize an aerodynamics design problem. The seminal 
work in Evolution Strategies was Rechenberg's PhD thesis [5] that was 
later published as a book [6], both in German. Many technical reports 
and papers were published by Schwefel and Rechenberg, although the 
seminal paper published in English was by Klockgether and Schwefel on 
the two-phase nozzle design problem [4]. 

Learn More 

Schwefel published his PhD dissertation [8] not long after Rechenberg, 
which was also published as a book [9] , both in German. Schwefel's book 
was later translated into English and represents a classical reference for 
the technique [10]. Back et al. provide a classical introduction to the 
technique, covering the history, development of the algorithm, and the 
steps that lead it to where it was in 1991 [1]. Beyer and Schwefel provide 
a contemporary introduction to the field that includes a detailed history 
of the approach, the developments and improvements since its inception, 
and an overview of the theoretical findings that have been made [3] . 

3.4.8 Bibliography 
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strategies. In Proceedings of the Fourth International Conference 
on Genetic Algorithms, pages 2-9, 1991. 
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3.5. Differential Evolution 



119 



3.5 Differential Evolution 

Differential Evolution, DE. 

3.5.1 Taxonomy 

Differential Evolution is a Stochastic Direct Search and Global Optimiza- 
tion algorithm, and is an instance of an Evolutionary Algorithm from the 
field of Evolutionary Computation. It is related to sibling Evolutionary 
Algorithms such as the Genetic Algorithm (Section 3.2), Evolutionary 
Programming (Section 3.6), and Evolution Strategies (Section 3.4), and 
has some similarities with Particle Swarm Optimization (Section 6.2). 

3.5.2 Strategy 

The Differential Evolution algorithm involves maintaining a population of 
candidate solutions subjected to iterations of recombination, evaluation, 
and selection. The recombination approach involves the creation of new 
candidate solution components based on the weighted difference between 
two randomly selected population members added to a third population 
member. This perturbs population members relative to the spread of 
the broader population. In conjunction with selection, the perturbation 
effect self-organizes the sampling of the problem space, bounding it to 
known areas of interest. 

3.5.3 Procedure 

Differential Evolution has a specialized nomenclature that describes 
the adopted configuration. This takes the form of D'E/x/y/z, where x 
represents the solution to be perturbed (such a random or best). The 
y signifies the number of difference vectors used in the perturbation of 
x, where a difference vectors is the difference between two randomly 
selected although distinct members of the population. Finally, z signifies 
the recombination operator performed such as bin for binomial and exp 
for exponential. 

Algorithm 3.5.1 provides a pseudocode listing of the Differential 
Evolution algorithm for minimizing a cost function, specifically a DE/- 
rand/l/bin configuration. Algorithm 3.5.2 provides a pseudocode listing 
of the NewSample function from the Differential Evolution algorithm. 

3.5.4 Heuristics 

• Differential evolution was designed for nonlinear, non-differentiable 
continuous function optimization. 
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Algorithm 3.5.1: Pseudocode for Differential Evolution. 



,, Weighting f ac 



Input: Population S i zei Problei 

Crossover rate 
Output: S best 

1 Population 4— InitializePopulation(Popu/aizon s 
Problem S i ze ) ; 

2 EvaluatePopulation(Population) ; 

3 Sbest GetBestSolution(Population) ; 

4 while -i StopConditionO do 

5 NewPopulation <— 0; 

6 foreach Pi e Population do 

7 Si <— NewSample (Pj, Population, Problem s i z 
Weighting f actor , Crossover rate ) ; 

8 if Cost (Si) < Cost (Pi) then 

9 | NewPopulation <— Si; 
10 else 

n j NewPopulation <— Pjj 

12 end 

13 end 

14 Population <— NewPopulation; 

15 EvaluatePopulation(Population) ; 

16 Sbest GetBestSolution(Population) ; 

17 end 

is return S 6est ; 



• The weighting factor F <E [0, 2] controls the amplification of differ- 
ential variation, a value of 0.8 is suggested. 

• the crossover weight CR € [0, 1] probabilistically controls the 
amount of recombination, a value of 0.9 is suggested. 

• The initial population of candidate solutions should be randomly 
generated from within the space of valid solutions. 

• The popular configurations are DE/rand/1/* and DE/best/2/*. 



3.5.5 Code Listing 

Listing 3.4 provides an example of the Differential Evolution algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization that seeks 
min/(a;) where / = Y17=i x i> — 5-0 < Xi < 5.0 and n — 3. The optimal 
solution for this basin function is (vo, . . . , i>„-i) = 0.0. The algorithm is 
an implementation of Differential Evolution with the DE/rand/l/bin 
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Algorithm 3.5.2: Pseudocode for the NewSample function. 
Input: P 0 , Population, NP, F, CR 
Output: S 

1 repeat 

2 | Pi <— RandomMember (Population) ; 

3 until Pi ^ P 0 ; 

4 repeat 

5 | P2 <— RandomMember (Population) ; 
e until P 2 £ P 0 V P 2 ^ Pi ; 

7 repeat 

8 J P3 f- RandomMember (Population) ; 

9 until P 3 £ P 0 V P 3 ? Pi V P 3 £ P 2 ; 

10 CutPoint RandomPosition(NP) ; 

11 S <- 0; 

12 for i to NP do 



13 
14 
15 
16 
17 



if i = CutPoint A RandO < CR then 

I Si <- P 3i + F x (P lr P 2 J; 
else 

I <— Po 4 ; 
end 



is end 

19 return S; 



configuration proposed by Storn and Price [9 



def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array .new (minmax . size) do |i| 

minmax[i][0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def de_rand_l_bin(p0 , pi, p2, p3, f, cr, search_space) 
sample = {: vector=>Array .new(p0 [: vector] . size) } 
cut = rand(sample [: vector] . size-1) + 1 
sample [: vector] . each_index do |i| 
sample [: vector] [i] = pO [: vector] [i] 
if (i==cut or randO < cr) 

v = p3 [: vector] [i] + f * (pi [tvector] [i] - p2 [: vector] [i] ) 
v = search_space [i] [0] if v < search_space [i] [0] 
v = search_space [i] [1] if v > search_space [i] [1] 
sample [: vector] [i] = v 
end 
end 
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return sample 
end 



def select_parents(pop, current) 

pi, p2, p3 = rand(pop.size) , rand (pop . size) , rand(pop. size) 

pi = rand(pop.size) until pi != current 

p2 = rand(pop . size) until p2 != current and p2 != pi 

p3 = rand(pop.size) until p3 != current and p3 != pi and p3 != p2 

return [pl,p2,p3] 

end 



def create_children(pop , minmax, f, cr) 
children = [] 

pop . each_with_index do |p0, i| 

pi, p2, p3 = select_parents (pop , i) 

children << de_rand_l_bin(p0 , pop [pi] , pop [p2] , pop [p3] , f, cr, 
minmax) 

end 

return children 
end 



def select_population (parents , children) 

return Array .new(parents . size) do |i| 

(children[i] [: cost] <=parents [i] [: cost] ) ? children[i] : parents[i] 

end 
end 



def search (max_gens, search_space , pop_size, f, cr) 

pop = Array .new(pop_size) {|i| { : vector=>random_vector (search_space)}} 
pop.each{|c| c[:cost] = objective_f unction(c [: vector] )} 
best = pop. sort{ I x,y I x[:cost] <=> y [: cost] }. first 
max_gens .times do I gen I 

children = create_children(pop , search_space , f, cr) 

children. each{ I c I c[:cost] = objective_f unction(c [: vector] )} 

pop = select_population(pop, children) 

pop . sort ! { I x,y I x[:cost] <=> y[:cost]} 

best = pop. first if pop . first [: cost] < best [: cost] 

puts " > gen #{gen+l}, f itness=#{best [ : cost] }" 
end 

return best 
end 



if __FILE__ == $0 

# problem configuration 
problem_size = 3 

search_space = Array .new (problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_gens = 200 

pop_size = 10*problem_size 
weightf =0.8 
crossf =0.9 

# execute the algorithm 

best = search (max_gens , search_space , pop_size, weightf, crossf) 
puts "done! Solution: f =#{best [ : cost] } , s=#{best [: vector] . inspect}" 
end 
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i 

Listing 3.4: Differential Evolution in Ruby 

3.5.6 References 
Primary Sources 

The Differential Evolution algorithm was presented by Storn and Price in 
a technical report that considered DEI and DE2 variants of the approach 
applied to a suite of continuous function optimization problems [7]. An 
early paper by Storn applied the approach to the optimization of an IIR- 
filtcr (Infinite Impulse Response) [5]. A second early paper applied the 
approach to a second suite of benchmark problem instances, adopting 
the contemporary nomenclature for describing the approach, including 
the DE/rand/1/* and DE/best/2/* variations [8]. The early work 
including technical reports and conference papers by Storn and Price 
culminated in a seminal journal article [9]. 

Learn More 

A classical overview of Differential Evolution was presented by Price 
and Storn [2], and terse introduction to the approach for function 
optimization is presented by Storn [6]. A seminal extended description 
of the algorithm with sample applications was presented by Storn and 
Price as a book chapter [3]. Price, Storn, and Lampinen released a 
contemporary book dedicated to Differential Evolution including theory, 
benchmarks, sample code, and numerous application demonstrations 
[4]. Chakraborty also released a book considering extensions to address 
complexities such as rotation invariance and stopping criteria [1]. 

3.5.7 Bibliography 
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[5] R. Storn. Differential evolution design of an IIR-filter. In Proceedings 
IEEE Conference Evolutionary Computation, pages 268-273. IEEE, 
1996. 



124 



Chapter 3. Evolutionary Algorithms 
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3.6 Evolutionary Programming 

Evolutionary Programming, EP. 

3.6.1 Taxonomy 

Evolutionary Programming is a Global Optimization algorithm and 
is an instance of an Evolutionary Algorithm from the field of Evolu- 
tionary Computation. The approach is a sibling of other Evolutionary 
Algorithms such as the Genetic Algorithm (Section 3.2), and Learning 
Classifier Systems (Section 3.9). It is sometimes confused with Genetic 
Programming given the similarity in name (Section 3.3), and more 
recently it shows a strong functional similarity to Evolution Strategies 
(Section 3.4). 

3.6.2 Inspiration 

Evolutionary Programming is inspired by the theory of evolution by 
means of natural selection. Specifically, the technique is inspired by 
macro-level or the species-level process of evolution (phenotype, heredi- 
tary, variation) and is not concerned with the genetic mechanisms of 
evolution (genome, chromosomes, genes, alleles). 

3.6.3 Metaphor 

A population of a species reproduce, creating progeny with small pheno- 
typical variation. The progeny and the parents compete based on their 
suitability to the environment, where the generally more fit members 
constitute the subsequent generation and are provided with the oppor- 
tunity to reproduce themselves. This process repeats, improving the 
adaptive fit between the species and the environment. 

3.6.4 Strategy 

The objective of the Evolutionary Programming algorithm is to maximize 
the suitability of a collection of candidate solutions in the context of 
an objective function from the domain. This objective is pursued by 
using an adaptive model with surrogates for the processes of evolution, 
specifically hereditary (reproduction with variation) under competition. 
The representation used for candidate solutions is directly assessable by 
a cost or objective function from the domain. 

3.6.5 Procedure 

Algorithm 3.6.1 provides a pseudocode listing of the Evolutionary Pro- 
gramming algorithm for minimizing a cost function. 
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Algorithm 3.6.1: Pseudocode for Evolutionary Programming. 
Input: Population size-, ProblemSize, BoutSize 
Output: S best 

1 Population <— InitializePopulation(Popu/aizon s i ze; 
ProblemSize) ; 

2 EvaluatePopulation(Population) ; 

3 Stest GetBestSolution(Population) ; 

4 while ^StopConditionO do 
Children <- 0; 

foreach Parenti E Population do 
Childi <— Mutate (Parenti) ; 
Children <- Childi; 
end 

EvaluatePopulation(Children) ; 
Sbest ^- GetBestSolution(Children, Sbest)', 
Union <— Population + Children; 
foreach Si € Union do 
for 1 to BoutSize do 



5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 



Sj <— RandomSelection(Union) ; 
if Cost (.Si) < Cost (Sj) then 
i Si w i ns 4 Si w i ns ~t~ 1, 
end 



end 
end 

Population 

22 end 

23 return S be su 



SelectBestByWins (Union, Population size ) ; 



3.6.6 Heuristics 

• The representation for candidate solutions should be domain spe- 
cific, such as real numbers for continuous function optimization. 

• The sample size (bout size) for tournament selection during com- 
petition is commonly between 5% and 10% of the population 
size. 

• Evolutionary Programming traditionally only uses the mutation 
operator to create new candidate solutions from existing can- 
didate solutions. The crossover operator that is used in some 
other Evolutionary Algorithms is not employed in Evolutionary 
Programming. 

• Evolutionary Programming is concerned with the linkage between 
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parent and child candidate solutions and is not concerned with 
surrogates for genetic mechanisms. 

• Continuous function optimization is a popular application for 
the approach, where real- valued representations are used with a 
Gaussian-based mutation operator. 

• The mutation-specific parameters used in the application of the 
algorithm to continuous function optimization can be adapted in 
concert with the candidate solutions [4] . 

3.6.7 Code Listing 

Listing 3.5 provides an example of the Evolutionary Programming algo- 
rithm implemented in the Ruby Programming Language. The demon- 
stration problem is an instance of a continuous function optimization 
that seeks min/(x) where / = Yli=i x h —5.0 < Xi < 5.0 and n = 2. 
The optimal solution for this basin function is (vq, . . . , t> n -i) = 0.0. The 
algorithm is an implementation of Evolutionary Programming based 
on the classical implementation for continuous function optimization 
by Fogel et al. [4] with per-variable adaptive variance based on Fogel's 
description for a self-adaptive variation on page 160 of his 1995 book 

m- 

def objective_f unction (vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array .new (minmax . size) do |i| 

minmax[i][0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def random_gaussian(mean=0 . 0 , stdev=1.0) 
ul = u2 = w = 0 
begin 

ul = 2 * randO - 1 

u2 = 2 * randO - 1 

w = ul * ul + u2 * u2 
end while w >= 1 

w = Math. sqrt( (-2.0 * Math.log(w)) / w) 
return mean + (u2 * w) * stdev 
end 

def mutate (candidate , search_space) 
child = { : vector=> [] , : strategy=> [] } 
candidate [: vector] . each_with_index do |v_old, i| 

s_old = candidate [: strategy] [i] 

v = v_old + s_old * random_gaussian() 

v = search_space [i] [0] if v < search_space [i] [0] 

v = search_space [i] [1] if v > search_space [i] [1] 
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child [: vector] << v 

child [: strategy] « s_old + random_gaussian() * s_old. abs**0 . 5 
end 

return child 
end 

def tournament (candidate, population, bout_size) 
candidate [: wins] = 0 
bout_size. times do |i| 

other = population[rand(population.size)] 

candidate [: wins] += 1 if candidate [: fitness] < other [: fitness] 
end 
end 

def init_population(minmax , pop_size) 

strategy = Array .new(minmax . size) do |i| 

[0, (minmax[i] [1] -minmax [i] [0] ) * 0.05] 
end 

pop = Array. new(pop_size, {}) 
pop . each_index do |i| 

pop [i] [: vector] = random_vector (minmax) 

pop [i] [: strategy] = random_vector (strategy) 
end 

pop.each{|c| c[:fitness] = objective_f unction(c [: vector] )} 
return pop 
end 

def search (max_gens, search_space , pop_size, bout_size) 
population = init_population(search_space, pop_size) 
population. each{ I c I c[:fitness] = objective_function(c [: vector] )} 
best = population. sort{ I x,y I x[:fitness] <=> y [: fitness] }. first 
max_gens .times do I gen I 

children = Array .new(pop_size) { I i I mutate (population [i] , 
search_space) } 

children. each{ I c I c[:fitness] = objective_f unction(c [: vector] )} 
children. sort !{ I x ,y I x[:fitness] <=> y[:fitness]} 

best = children. first if children. f irst [:f itness] < best [: fitness] 
union = children+population 

union. each{ I c I tournament (c , union, bout_size)} 
union. sort ! { I x,y I y[:wins] <=> x[:wins]} 
population = union. first (pop_size) 
puts " > gen #{gen}, f itness=#{best [: fitness] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array .new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_gens = 200 
pop_size = 100 
bout_size = 5 

# execute the algorithm 

best = search (max_gens , search_space , pop_size, bout_size) 
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84 puts "done! Solution: f =#{best [: fitness] } , s=#{best [: vector] . inspect}" 

85 end 

Listing 3.5: Evolutionary Programming in Ruby 

3.6.8 References 
Primary Sources 

Evolutionary Programming was developed by Lawrence Fogel, outlined 
in early papers (such as [5] ) and later became the focus of his PhD dis- 
sertation [6] . Fogel focused on the use of an evolutionary process for the 
development of control systems using Finite State Machine (FSM) repre- 
sentations. Fogel's early work on Evolutionary Programming culminated 
in a book (co-authored with Owens and Walsh) that elaborated the 
approach, focusing on the evolution of state machines for the prediction 
of symbols in time series data [9] . 

Learn More 

The field of Evolutionary Programming lay relatively dormant for 30 
years until it was revived by Fogel's son, David. Early works considered 
the application of Evolutionary Programming to control systems [11], 
and later function optimization (system identification) culminating in 
a book on the approach [1], and David Fogel's PhD dissertation [2]. 
Lawrence Fogel collaborated in the revival of the technique, including 
reviews [7, 8] and extensions on what became the focus of the approach 
on function optimization [4] . 

Yao et al. provide a seminal study of Evolutionary Programming 
proposing an extension and racing it against the classical approach on a 
large number of test problems [12]. Finally, Porto provides an excellent 
contemporary overview of the field and the technique [10]. 
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3.7 Grammatical Evolution 

Grammatical Evolution, GE. 

3.7.1 Taxonomy 

Grammatical Evolution is a Global Optimization technique and an 
instance of an Evolutionary Algorithm from the field of Evolutionary 
Computation. It may also be considered an algorithm for Automatic 
Programming. Grammatical Evolution is related to other Evolutionary 
Algorithms for evolving programs such as Genetic Programming (Sec- 
tion 3.3) and Gene Expression Programming (Section 3.8), as well as 
the classical Genetic Algorithm that uses binary strings (Section 3.2). 

3.7.2 Inspiration 

The Grammatical Evolution algorithm is inspired by the biological 
process used for generating a protein from genetic material as well as 
the broader genetic evolutionary process. The genome is comprised of 
DNA as a string of building blocks that are transcribed to RNA. RNA 
codons are in turn translated into sequences of amino acids and used in 
the protein. The resulting protein in its environment is the phenotype. 

3.7.3 Metaphor 

The phenotype is a computer program that is created from a binary 
string-based genome. The genome is decoded into a sequence of integers 
that are in turn mapped onto pre-defined rules that makeup the program. 
The mapping from genotype to the phenotype is a one-to-many process 
that uses a wrapping feature. This is like the biological process observed 
in many bacteria, viruses, and mitochondria, where the same genetic 
material is used in the expression of different genes. The mapping adds 
robustness to the process both in the ability to adopt structure-agnostic 
genetic operators used during the evolutionary process on the sub- 
symbolic representation and the transcription of well-formed executable 
programs from the representation. 

3.7.4 Strategy 

The objective of Grammatical Evolution is to adapt an executable 
program to a problem specific objective function. This is achieved 
through an iterative process with surrogates of evolutionary mechanisms 
such as descent with variation, genetic mutation and recombination, and 
genetic transcription and gene expression. A population of programs 
are evolved in a sub-symbolic form as variable length binary strings 
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and mapped to a symbolic and well-structured form as a context free 
grammar for execution. 

3.7.5 Procedure 

A grammar is defined in Backus Normal Form (BNF), which is a context 
free grammar expressed as a series of production rules comprised of 
terminals and non-terminals. A variable-length binary string represen- 
tation is used for the optimization process. Bits are read from the a 
candidate solutions genome in blocks of 8 called a codon, and decoded 
to an integer (in the range between 0 and 2 8 — 1). If the end of the 
binary string is reached when reading integers, the reading process loops 
back to the start of the string, effectively creating a circular genome. 
The integers are mapped to expressions from the BNF until a complete 
syntactically correct expression is formed. This may not use a solutions 
entire genome, or use the decoded genome more than once given it's 
circular nature. Algorithm 3.7.1 provides a pseudocode listing of the 
Grammatical Evolution algorithm for minimizing a cost function. 

3.7.6 Heuristics 

• Grammatical Evolution was designed to optimize programs (such 
as mathematical equations) to specific cost functions. 

• Classical genetic operators used by the Genetic Algorithm may 
be used in the Grammatical Evolution algorithm, such as point 
mutations and one-point crossover. 

• Codons (groups of bits mapped to an integer) are commonly fixed 
at 8 bits, proving a range of integers € [0, 2 8 — 1] that is scaled to 
the range of rules using a modulo function. 

• Additional genetic operators may be used with variable-length 
representations such as codon segments, duplication (add to the 
end), number of codons selected at random, and deletion. 

3.7.7 Code Listing 

Listing 3.6 provides an example of the Grammatical Evolution algorithm 
implemented in the Ruby Programming Language based on the version 
described by O'Neill and Ryan [5]. The demonstration problem is an 
instance of symbolic regression f(x) = x 4 + x 3 + x 2 +x, where x £ [1, 10]. 
The grammar used in this problem is: 

• Non-terminals: N = {expr, op,pre-op} 

• Terminals: T ={+,—,-=-, X, x, 1.0} 
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Algorithm 3.7.1: Pseudocode for Grammatical Evolution. 
Input: Grammar, Codon numbi ts, Population size , P cr0 ssover, 

Pmutationj Pdeletej Pduplicate 

Output: S best 

1 Population -s— InitializePopulation(PopuZafrion s i ze , 

Codon num i,its*) i 

2 foreach Si € Population do 

3 Slintegers ^ Decode (Sibitstring: C odon num bits ) , 

4 Sip r0 g ram 4 Map (Sii n tegers > Grammar); 

5 Sleost ^ Execute (Siprogram*) 5 

6 end 

7 Sbest GetBestSolution(Population); 

8 while ^StopConditionO do 
Parents <— SelectParents (Population, Population s i ze ) ; 
Children <- 0; 

foreach Parenti, Parentj S Parents do 



9 

10 
n 

12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 



S 1 , <— Crossover (Parenti, Parentj, P cr0 ssover) '! 

Sibitstring <~ CodonDelet ion (Sibitstring , Pdelete)\ 
Slbitstring <~ CodonDuplicat ion (Sibitstring , Pduplicate) \ 
Sibitstring ^ Mu"t3."t6 (Sibitstring 7 Pmutation) t 

Children <— Sf, 
end 

foreach Si E Children do 

Siintegers ^ Decode (Sibit st ring} C odon nurn bits^ i 

Siprogram ^ Map (Sii n tegers> Grammar); 

Sleost ^ Ex6CU"t6 (Siprogram ) 5 

end 

Sbest GetBestSolution(Children) ; 
Population Replace (Population, Children); 

25 end 

26 return S bes u 



• Expression (program): S =<expr> 

The production rules for the grammar in BNF are: 

• <expr> ::= <expr><op><expr> , (<expr><op><expr>) , <pre_op>(<expr>), 
<var> 

• <op> ::= +, x 

• <var> ::— x, 1.0 

The algorithm uses point mutation and a codon-respecting one-point 
crossover operator. Binary tournament selection is used to determine the 
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parent population's contribution to the subsequent generation. Binary 
strings are decoded to integers using an unsigned binary. Candidate 
solutions are then mapped directly into executable Ruby code and 
executed. A given candidate solution is evaluated by comparing its 
output against the target function and taking the sum of the absolute 
errors over a number of trials. The probabilities of point mutation, codon 
deletion, and codon duplication are hard coded as relative probabilities 
to each solution, although should be parameters of the algorithm. In 
this case they are heuristically defined as ^p, jf^ and respectively, 
where L is the total number of bits, and NC is the number of codons 
in a given candidate solution. 

Solutions are evaluated by generating a number of random samples 
from the domain and calculating the mean error of the program to 
the expected outcome. Programs that contain a single term or those 
that return an invalid (NaN) or infinite result are penalized with an 
enormous error value. The implementation uses a maximum depth in 
the expression tree, whereas traditionally such deep expression trees are 
marked as invalid. Programs that resolve to a single expression that 
returns the output are penalized. 

def binary_tournament (pop) 

i, j = rand(pop . size) , rand (pop . size) 
j = rand(pop . size) while j==i 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def point_mutation(bitstring, rate=l . O/bitstring. size . to_f ) 
child = "" 

bitstring. size .times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def one_point_crossover (parentl , parent2, codon_bits, p_cross=0 . 30) 
return " "+parentl [ :bitstring] if randO >=p_cross 
cut = rand( [parentl [ :bitstring] . size , 

parent2 [: bitstring] .size] .min/codon_bits) 
cut *= codon_bits 

p2size = parent2 [: bitstring] . size 

return parentl [:bitstring] [0. . . cut] +parent2 [: bitstring] [cut. . .p2size] 
end 

def codon_duplication(bitstring, codon_bits, rate=l . 0/codon_bits . to_f ) 
return bitstring if randO >= rate 
codons = bitstring. size/codon_bits 

return bitstring + bitstring [rand(codons) *codon_bits , codon_bits] 
end 

def codon_deletion(bitstring, codon_bits, rate=0 . 5/codon_bits . to_f ) 
return bitstring if randO >= rate 
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codons = bitstring. size/codon_bits 
off = rand(codons) *codon_bits 

return bitstring [0 ... of f] + bitstring[off +codon_bits .. .bitstring. size] 
end 

def reproduce (selected, pop_size, p_cross, codon_bits) 
children = [] 

selected. each_with_index do I pi, i| 
p2 = (i.modulo(2)==0) ? selected [i+1] : selected[i-l] 
p2 = selected[0] if i == selected. size-1 
child = {} 

child [: bitstring] = one_point_crossover(pl, p2, codon_bits, p_cross) 
child [: bitstring] = codon_deletion(child[:bitstring] , codon_bits) 
child [: bitstring] = codon_duplication(child [: bitstring] , codon_bits) 
child [: bitstring] = point_mutation(child[:bitstring] ) 
children << child 

break if children. size == pop_size 
end 

return children 
end 

def random_bitstring(num_bits) 

return (0. . .num.bits) . inject (""){ I s, i I s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def decode_integers(bitstring, codon_bits) 
ints = [] 

(bitstring. size/codon_bits) .times do I off I 

codon = bitstring [of f*codon_bits, codon_bits] 
sum = 0 

codon. size. times do |i| 

sum += ( (codon [i] .chr=='l') ? 1 : 0) * (2 ** i) ; 
end 

ints << sum 
end 

return ints 
end 

def map (grammar, integers, max_depth) 
done, offset, depth = false, 0, 0 
symbolic_string = grammar ["S"] 
begin 

done = true 

grammar .keys . each do I key I 

symbolic_string = symbolic_string.gsub(key) do |k| 
done = false 

set = (k=="EXP" kk depth>=max_depth-l) ? grammar ["VAR"] : 
grammar [k] 

integer = integers [of f set] .modulo (set . size) 
offset = (of fset==integers . size-1) ? 0 : offset+1 
set [integer] 
end 
end 

depth += 1 
end until done 
return symbolic_string 
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87 end 



def target_f unction(x) 

return x**4.0 + x**3.0 + x**2.0 + x 
end 

def sample_from_bouiids (bounds) 

return bounds [0] + ( (bounds [1] - bounds [0] ) * randO) 
end 

def cost (program, bounds, num_trials=30) 
return 9999999 if program. strip == "INPUT" 
sum_error = 0.0 
num.trials . times do 

x = sample_from_bounds (bounds) 

expression = pr ogr am. gsub(" INPUT " , x.to_s) 

begin score = eval (expression) rescue score = 0.0/0.0 end 

return 9999999 if score. nan? or score . infinite? 

sum_error += (score - target_f unction(x) ) . abs 
end 

return sum_error / num_trials . to_f 
end 

def evaluate(candidate, codon_bits, grammar, max_depth, bounds) 
candidate [: integers] = decode_integers (candidate [ :bitstring] , 
codon_bits) 

candidate [: program] = map(grammar, candidate [: integers] , max_depth) 
candidate [:fitness] = cost (candidate [: program] , bounds) 
end 

def search (max_gens , pop_size, codon_bits, num_bits, p_cross, grammar, 
max_depth, bounds) 
pop = Array .new(pop_size) {|i| 

{ :bitstring=>random_bitstring(num_bits) }} 
pop . each{ I c I evaluate (c , codon_bits , grammar, max_depth, bounds)} 
best = pop . sort{ I x,y I x[:fitness] <=> y [: fitness] }. first 
max_gens .times do I gen I 

selected = Array. new(pop_size){ I i I binary_tournament (pop) } 
children = reproduce (selected, pop_size, p_cross , codon_bits) 
children. each{ I c I evaluate(c, codon_bits, grammar, max_depth, 
bounds ) } 

children. sort !{ I x ,y I x[:fitness] <=> y[:fitness]} 

best = children. first if children. f irst [:f itness] <= best [: fitness] 
pop=(children+pop) . sort{ I x,y I 

x [:f itness] <=>y [: fitness] } . first (pop_size) 
puts " > gen=#{gen}, f =#{best [: fitness] } , s=#{best [ :bitstring] }" 
break if best [: fitness] == 0.0 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
grammar = {"S"=>"EXP" , 

"EXP"=>[" EXP BINARY EXP ", " (EXP BINARY EXP) ", " VAR "] , 

"BINARY"=>["+" , »/", "*" ], 
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"VAR"=> ["INPUT", "1.0"]} 
bounds = [1, 10] 

# algorithm configuration 
max_depth = 7 

max_gens = 50 
pop_size = 100 
codon_bits = 4 
num_bits = 10*codon_bits 
p_cross = 0.30 

# execute the algorithm 

best = search (max_gens , pop_size, codon_bits, num_bits, p_cross, 

grammar, max_depth, bounds) 
puts "done! Solution: f =#{best [: fitness] } , s=#{best [: program] }" 
end 



Listing 3.6: Grammatical Evolution in Ruby 



3.7.8 References 
Primary Sources 

Grammatical Evolution was proposed by Ryan, Collins and O'Neill in a 
seminal conference paper that applied the approach to a symbolic regres- 
sion problem [7] . The approach was born out of the desire for syntax 
preservation while evolving programs using the Genetic Programming 
algorithm. This seminal work was followed by application papers for a 
symbolic integration problem [2, 3] and solving trigonometric identities 
[8]. 



Learn More 

O'Neill and Ryan provide a high-level introduction to Grammatical 
Evolution and early demonstration applications [4]. The same authors 
provide a thorough introduction to the technique and overview of the 
state of the field [5], O'Neill and Ryan present a seminal reference 
for Grammatical Evolution in their book [6]. A second more recent 
book considers extensions to the approach improving its capability on 
dynamic problems [1]. 
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3.8 Gene Expression Programming 

Gene Expression Programming, GEP. 

3.8.1 Taxonomy 

Gene Expression Programming is a Global Optimization algorithm and 
an Automatic Programming technique, and it is an instance of an Evo- 
lutionary Algorithm from the field of Evolutionary Computation. It 
is a sibling of other Evolutionary Algorithms such as a the Genetic 
Algorithm (Section 3.2) as well as other Evolutionary Automatic Pro- 
gramming techniques such as Genetic Programming (Section 3.3) and 
Grammatical Evolution (Section 3.7). 

3.8.2 Inspiration 

Gene Expression Programming is inspired by the replication and expres- 
sion of the DNA molecule, specifically at the gene level. The expression 
of a gene involves the transcription of its DNA to RNA which in turn 
forms amino acids that make up proteins in the phenotype of an organ- 
ism. The DNA building blocks are subjected to mechanisms of variation 
(mutations such as coping errors) as well as recombination during sexual 
reproduction. 

3.8.3 Metaphor 

Gene Expression Programming uses a linear genome as the basis for 
genetic operators such as mutation, recombination, inversion, and trans- 
position. The genome is comprised of chromosomes and each chromo- 
some is comprised of genes that are translated into an expression tree 
to solve a given problem. The robust gene definition means that genetic 
operators can be applied to the sub-symbolic representation without 
concern for the structure of the resultant gene expression, providing 
separation of genotype and phenotype. 

3.8.4 Strategy 

The objective of the Gene Expression Programming algorithm is to 
improve the adaptive fit of an expressed program in the context of a 
problem specific cost function. This is achieved through the use of an 
evolutionary process that operates on a sub-symbolic representation of 
candidate solutions using surrogates for the processes (descent with mod- 
ification) and mechanisms (genetic recombination, mutation, inversion, 
transposition, and gene expression) of evolution. 
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3.8.5 Procedure 

A candidate solution is represented as a linear string of symbols called 
Karva notation or a K-expression, where each symbol maps to a function 
or terminal node. The linear representation is mapped to an expression 
tree in a breadth-first manner. A K-expression has hxed length and is 
comprised of one or more sub-expressions (genes), which are also defined 
with a fixed length. A gene is comprised of two sections, a head which 
may contain any function or terminal symbols, and a tail section that 
may only contain terminal symbols. Each gene will always translate 
to a syntactically correct expression tree, where the tail portion of the 
gene provides a genetic buffer which ensures closure of the expression. 

Algorithm 3.8.1 provides a pseudocode listing of the Gene Expression 
Programming algorithm for minimizing a cost function. 



Algorithm 3.8.1: Pseudocode for GEP. 



Input: Grammar, Population size , Headi eng th, Tail length , 

■^crossover j ^mutation 

Output: S best 

1 Population «— InitializePopulation(Popu/aiion s ^ e , Grammar. 

Headiength, Tail length)', 

2 foreach Si G Population do 

3 Sip r0 g rarn <— DecodeBreadthFirst (Sig en ome, Grammar); 

4 Sl C ost 4 ExSCUtG (Slp r0 grcL7n) j 

5 end 

6 Sbest <— GetBestSolution(Population) ; 

7 while ^StopConditionO do 

8 Parents <— SelectParents (Population, Population size ) ; 

9 Children <- 0; 

10 foreach Parenti, Parent2 G Parents do 

n Sigenome <- Crossover (Parent , Parents, P cro ssover) ', 

12 Sigenome 4 Mutate (Sigenome, Pvnutation) , 

13 Children <— Si\ 

14 end 

15 foreach Si G Children do 

16 Siprogram 4— DecodeBreadthFirst (Sigenome, Grammar); 

17 Sleost 4 ExSCUtG ( Slp r0 gram ) , 

is end 

19 Population t— Replace (Population, Children); 

20 Sbest 4— GetBestSolution(Children) ; 

21 end 

22 return Sbest', 
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3.8.6 Heuristics 

• The length of a chromosome is defined by the number of genes, 
where a gene length is defined by h + 1. The h is a user defined 
parameter (such as 10), and t is defined as t = h(n — 1 ) + 1, where 
the n represents the maximum arity of functional nodes in the 
expression (such as 2 if the arithmetic functions x, -=-,—,+ arc 
used). 

• The mutation operator substitutes expressions along the genome, 
although must respect the gene rules such that function and 
terminal nodes are mutated in the head of genes, whereas only 
terminal nodes are substituted in the tail of genes. 

• Crossover occurs between two selected parents from the population 
and can occur based on a one-point cross, two point cross, or a 
gene-based approach where genes are selected from the parents 
with uniform probability. 

• An inversion operator may be used with a low probability that 
reverses a small sequence of symbols (1-3) within a section of a 
gene (tail or head). 

• A transposition operator may be used that has a number of dif- 
ferent modes, including: duplicate a small sequences (1-3) from 
somewhere on a gene to the head, small sequences on a gene to the 
root of the gene, and moving of entire genes in the chromosome. 
In the case of intra-gene transpositions, the sequence in the head 
of the gene is moved down to accommodate the copied sequence 
and the length of the head is truncated to maintain consistent 
gene sizes. 

• A '?' may be included in the terminal set that represents a 
numeric constant from an array that is evolved on the end of 
the genome. The constants are read from the end of the genome 
and are substituted for '?' as the expression tree is created (in 
breadth first order). Finally the numeric constants are used as 
array indices in yet another chromosome of numerical values which 
are substituted into the expression tree. 

• Mutation is low (such as selection can be any of the classical 
approaches (such as roulette wheel or tournament), and crossover 
rates are typically high (0.7 of offspring) 

• Use multiple sub-expressions linked together on hard problems 
when one gene is not sufficient to address the problem. The sub- 
expressions are linked using link expressions which are function 
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nodes that are either statically defined (such as a conjunction) or 
evolved on the genome with the genes. 

3.8.7 Code Listing 

Listing 3.7 provides an example of the Gene Expression Programming 
algorithm implemented in the Ruby Programming Language based 
on the seminal version proposed by Ferreira [1]. The demonstration 
problem is an instance of symbolic regression f(x) = x 4 + a; 3 + x 2 + x, 
where x € [1,10]. The grammar used in this problem is: Functions: 
F = {+, x, } and Terminals: T = {x}. 

The algorithm uses binary tournament selection, uniform crossover 
and point mutations. The K-expression is decoded to an expression 
tree in a breadth-first manner, which is then parsed depth first as a 
Ruby expression string for display and direct evaluation. Solutions are 
evaluated by generating a number of random samples from the domain 
and calculating the mean error of the program to the expected outcome. 
Programs that contain a single term or those that return an invalid 
(NaN) or infinite result are penalized with an enormous error value. 

def binary_tournament (pop) 

i, j = rand(pop . size) , rand(pop . size) 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def point_mutation(grammar , genome, head_length, 
rate=l . 0/genome . size . to_f ) 
child ="" 

genome. size. times do |i| 
bit = genome [i] . chr 
if randO < rate 
if i < head_length 

selection = (randO < 0.5) ? grammar ["FUNC"] : grammar ["TERM"] 
bit = selection [rand(selection. size)] 
else 

bit = grammar ["TERM"] [rand (grammar ["TERM"] . size)] 
end 
end 

child « bit 
end 

return child 
end 

def crossover(parentl, parent2, rate) 
return ""+parentl if rand()>=rate 
child = "" 

parentl . size .times do |i| 

child « ((rand()<0.5) ? parentl[i] : parent2 [i] ) 
end 

return child 
end 
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def reproduce (grammar, selected, pop_size, p_crossover, head_length) 
children = [] 

selected. each_with_index do I pi, i| 
p2 = (i.modulo(2)==0) ? selected [i+1] : selected [i-1] 
p2 = selected[0] if i == selected. size-1 
child = {} 

child [: genome] = crossover (pi [: genome] , p2[: genome], p_crossover) 
child [: genome] = point_mutation(grammar , child [: genome] , 

head_length) 
children << child 
end 

return children 
end 

def random_genome (grammar , head_length, tail_length) 
s = "" 

head_length. times do 

selection = (randO < 0.5) ? grammar ["FUNC"] : grammar ["TERM"] 

s « selection[rand(selection.size)] 
end 

tail_length. times { s « grammar ["TERM"] [rand(grammar ["TERM"] .size)]} 
return s 
end 

def target_f unction(x) 

return x**4.0 + x**3.0 + x**2.0 + x 
end 

def sample_from_bounds (bounds) 

return bounds [0] + ( (bounds [1] - bounds [0]) * randO) 
end 

def cost (program, bounds, num_trials=30) 
errors = 0.0 
num_trials . times do 

x = sample_from_bounds (bounds) 

expression, score = program. gsub ( "x" , x.to_s), 0.0 
begin score = eval (expression) rescue score = 0.0/0.0 end 
return 9999999 if score. nan? or score . infinite? 
errors += (score - target_f unction(x) ) . abs 
end 

return errors / num_trials . to_f 
end 

def mapping (genome, grammar) 
off, queue = 0, [] 
root = {} 

root[:node] = genome [of f] . chr; off+=l 
queue . push (root) 
while ! queue . empty? do 
current = queue . shift 

if grammar ["FUNC"] . include? (current [: node] ) 
current [: left] = O 

current [: left] [: node] = genome [of f] . chr ; off+=l 
queue .push (current [ : lef t] ) 
current [ : right] = {} 
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current [:right] [:node] = genome [of f] . chr; off+=l 
queue .push (current [: right] ) 
end 
end 

return root 
end 

def tree_to_string(exp) 

return exp[:node] if (exp[:left] .nil? or exp[: right] .nil?) 

left = tree_to_string(exp[: lef t] ) 

right = tree_to_string (exp[: right] ) 

return " (#{lef t} #{exp [ :node] } #{right})" 
end 

def evaluate(candidate, grammar, bounds) 

candidate [: expression] = mapping(candidate [: genome] , grammar) 
candidate [: program] = tree_to_string(candidate [: expression] ) 
candidate [: fitness] = cost (candidate [: program] , bounds) 

end 

def search (grammar , bounds, h_length, t_length, max_gens, pop_size, 
p_cross) 
pop = Array .new(pop_size) do 

{: genome=>random_genome (grammar , h_length, t_length)} 
end 

pop.each{|c| evaluate(c, grammar, bounds)} 

best = pop.sort{|x,y| x[:fitness] <=> y [: fitness] }. first 

max_gens .times do I gen I 

selected = Array .new(pop){ I i I binary_tournament (pop) } 

children = reproduce (grammar , selected, pop_size, p_cross, h_length) 

children. each{ I c I evaluate(c, grammar, bounds)} 

children. sort !{ I x,y I x[: fitness] <=> y[:fitness]} 

best = children. first if children. f irst [:f itness] <= best [: fitness] 
pop = (children+pop) . first (pop_size) 

puts " > gen=#{gen}, f =#{best [: fitness] } , g=#{best [: genome] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

grammar = {"FUNC"=> ["+",»-","*","/"] . "TERM"=> ["x"] } 
bounds = [1.0, 10.0] 

# algorithm configuration 
h_length = 20 

t_length = h_length * (2-1) + 1 
max_gens = 150 
pop_size = 80 
p_cross = 0.85 

# execute the algorithm 

best = search (grammar , bounds, h_length, t_length, max_gens, 

pop_size, p_cross) 
puts "done! Solution: f =#{best [: fitness] } , program=#{best [: program] }" 
end 
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3.8.8 References 
Primary Sources 

The Gene Expression Programming algorithm was proposed by Ferreira 
in a paper that detailed the approach, provided a careful walkthrough 
of the process and operators, and demonstrated the the algorithm on a 
number of benchmark problem instances including symbolic regression 

[!]• 

Learn More 

Ferreira provided an early and detailed introduction and overview of 
the approach as book chapter, providing a step-by-step walkthrough of 
the procedure and sample applications [2]. A more contemporary and 
detailed introduction is provided in a later book chapter [3] . Ferreira 
published a book on the approach in 2002 covering background, the 
algorithm, and demonstration applications which is now in its second 
edition [4]. 

3.8.9 Bibliography 
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3.9 Learning Classifier System 

Learning Classifier System, LCS. 

3.9.1 Taxonomy 

The Learning Classifier System algorithm is both an instance of an 
Evolutionary Algorithm from the field of Evolutionary Computation 
and an instance of a Reinforcement Learning algorithm from Machine 
Learning. Internally, Learning Classifier Systems make use of a Ge- 
netic Algorithm (Section 3.2). The Learning Classifier System is a 
theoretical system with a number of implementations. The two main 
approaches to implementing and investigating the system empirically are 
the Pittsburgh-style that seeks to optimize the whole classifier, and the 
Michigan-style that optimize responsive rulesets. The Michigan-style 
Learning Classifier is the most common and is comprised of two versions: 
the ZCS (zeroth-level classifier system) and the XCS (accuracy-based 
classifier system). 

3.9.2 Strategy 

The objective of the Learning Classifier System algorithm is to optimize 
payoff based on exposure to stimuli from a problem-specific environment. 
This is achieved by managing credit assignment for those rules that 
prove useful and searching for new rules and new variations on existing 
rules using an evolutionary process. 

3.9.3 Procedure 

The actors of the system include detectors, messages, effectors, feedback, 
and classifiers. Detectors are used by the system to perceive the state of 
the environment. Messages are the discrete information packets passed 
from the detectors into the system. The system performs information 
processing on messages, and messages may directly result in actions in 
the environment. Effectors control the actions of the system on and 
within the environment. In addition to the system actively perceiving via 
its detections, it may also receive directed feedback from the environment 
(payoff). Classifiers are condition-action rules that provide a filter for 
messages. If a message satisfies the conditional part of the classifier, 
the action of the classifier triggers. Rules act as message processors. 
Message a fixed length bitstring. A classifier is defined as a ternary 
string with an alphabet € {1,0, where the # represents do not care 
(matching either 1 or 0). 

The processing loop for the Learning Classifier system is as follows: 

1. Messages from the environment are placed on the message list. 
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2. The conditions of each classifier are checked to see if they are 
satisfied by at least one message in the message list. 

3. All classifiers that are satisfied participate in a competition, those 
that win post their action to the message list. 

4. All messages directed to the effectors are executed (causing actions 
in the environment). 

5. All messages on the message list from the previous cycle are deleted 
(messages persist for a single cycle). 

The algorithm may be described in terms of the main processing 
loop and two sub-algorithms: a reinforcement learning algorithm such 
as the bucket brigade algorithm or Q-learning, and a genetic algorithm 
for optimization of the system. Algorithm 3.9.1 provides a pseudocode 
listing of the high-level processing loop of the Learning Classifier System, 
specifically the XCS as described by Butz and Wilson [3]. 

3.9.4 Heuristics 

The majority of the heuristics in this section are specific to the XCS 
Learning Classifier System as described by Butz and Wilson [3]. 

• Learning Classifier Systems are suited for problems with the follow- 
ing characteristics: perpetually novel events with significant noise, 
continual real-time requirements for action, implicitly or inexactly 
defined goals, and sparse payoff or reinforcement obtainable only 
through long sequences of tasks. 

• The learning rate f3 for a classifier's expected payoff, error, and 
fitness are typically in the range [0.1,0.2]. 

• The frequency of running the genetic algorithm Oqa should be in 
the range [25, 50]. 

• The discount factor used in multi-step programs 7 are typically in 
the around 0.71. 

• The minimum error whereby classifiers are considered to have 
equal accuracy eo is typically 10% of the maximum reward. 

• The probability of crossover in the genetic algorithm x is typically 
in the range [0.5, 1.0]. 

• The probability of mutating a single position in a classifier in the 
genetic algorithm /i is typically in the range [0.01, 0.05]. 
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Algorithm 3.9.1: Pseudocode for the LCS. 



Input: EnvironmentDetails 
Output: Population 

1 env-;— InitializeEnvironment (EnvironmentDetails) ; 

2 Population <— InitializePopulationO ; 

3 ActionSett-i 0; 

4 Input t -\ <— 0; 

5 Rewardt-i 0; 

6 while ^StopConditionO do 
Inputt env; 

Matchset <— GenerateMatchSet (Population, Inputt); 
Prediction <— GeneratePrediction(Matchset) ; 
Action <— SelectionAction(Prediction) ; 
ActionSett <— GenerateActionSet (Action, Matchset); 
Rewardt <— ExecuteAction(Action, env); 
if ActionSett-i ^ 0 then 

Payofft CalculatePayof f (Reward t -i, Prediction); 
Perf ormLeaxningC ActionSett-i, Payofft, Population): 



RunGeneticAlgorithm(ActionS'eit- 
Population) ; 



Input 



t-i, 



end 

if LastStepOf Task(env, Action) then 

Payofft <— Reward t ; 

Pert ormLearnlngi ActionS et t , Payofft, Population); 
RunGeneticAlgorithmC^ctionSett, Inputt, Population); 

ActionS ett-i 0; 
else 

ActionS et t -i ActionSet t ; 
Inputt-i <— Input t ; 
Rewardt-i <— Reward t ; 
end 



28 end 



• The experience threshold during classifier deletion Odei is typically 
about 20. 

• The experience threshold for a classifier during subsumption 0 su b 
is typically around 20. 

• The initial values for a classifier's expected payoff p\, error ei, and 
fitness /i are typically small and close to zero. 

• The probability of selecting a random action for the purposes of 
exploration p exp is typically close to 0.5. 
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• The minimum number of different actions that must be specified 
in a match set 6 mna is usually the total number of possible actions 
in the environment for the input. 

• Subsumption should be used on problem domains that are known 
contain well defined rules for mapping inputs to outputs. 

3.9.5 Code Listing 

Listing 3.8 provides an example of the Learning Classifier System algo- 
rithm implemented in the Ruby Programming Language. The problem 
is an instance of a Boolean multiplexer called the 6-multiplexer. It can 
be described as a classification problem, where each of the 2 6 patterns 
of bits is associated with a boolean class € {1,0}. For this problem 
instance, the first two bits may be decoded as an address into the 
remaining four bits that specify the class (for example in 100011, '10' 
decode to the index of '2' in the remaining 4 bits making the class 
'1'). In prepositional logic this problem instance may be described as 
F = (^xo)(^Xi)x2 + {^Xo)xiX3 + Xo(^Xi)x4 + XqXiX5. The algorithm is 
an instance of XCS based on the description provided by Butz and Wil- 
son [3] with the parameters based on the application of XCS to Boolean 
multiplexer problems by Wilson [14, 15]. The population is grown as 
needed, and subsumption which would be appropriate for the Boolean 
multiplexer problem was not used for brevity. The multiplexer problem 
is a single step problem, so the complexities of delayed payoff are not 
required. A number of parameters were hard coded to recommended 
values, specifically: a = 0.1, v = —0.5, S = 0.1 and P# = |. 

def neg(bit) 

return (bit==l) ? 0 : 1 
end 

def target_f unction(s) 

ints = Array .new(6){ I i I s [i] . chr . to_i} 
xO , xl , x2 ,x3 ,x4,x5 = ints 

return neg(x0)*neg(xl)*x2 + neg(x0)*xl*x3 + x0*neg(xl) *x4 + x0*xl*x5 
end 

def new_classif ier (condition, action, gen, pl=10.0, el=0.0, fl=10.0) 
other = {} 

other [: condition] , other [: action] , other [: lasttime] = condition, 
action, gen 

other [:pred] , other [: error] , other [: fitness] = pi, el, fl 
other [:exp], other [: setsize] , other [:num] =0.0, 1.0, 1.0 
return other 
end 

def copy_classif ier (parent) 
copy = {} 

parent . keys . each do I k I 
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copy[k] = (parent [k] . kind_of? String) ? ""+parent[k] : parent [k] 
end 

copy [ :num] , copy [: exp] = 1.0, 0.0 
return copy 
end 

def random_bitstring(size=6) 

return (0 . . . size) . inject (" "M I s , i I s+( (rand<0 . 5) ? "1" : "0")} 
end 

def calculate_deletion_vote(classif ier, pop, del_thresh, f _thresh=0 . 1) 
vote = classifier [: setsize] * classifier [ :num] 
total = pop. inject (0 . 0){ I s , c I s+c[:num]} 

avg_fitness = pop . inject (0 . 0){ I s , c I s + (c [: fitness] /total) } 

derated = classifier [: fitness] / classifier [ :num] . to_f 

if classifier [: exp] >del_thresh and derated< (f _thresh*avg_f itness) 

return vote * (avg_fitness / derated) 
end 

return vote 
end 

def delete_from_pop(pop, pop_size, del_thresh=20 . 0) 
total = pop. inject (0) {|s,c| s+c[:num]} 
return if total <= pop_size 

pop. each {|c| c[:dvote] = calculate_deletion_vote (c , pop, del_thresh)} 
vote_sum = pop . inject (0 . 0) {|s,c| s+c [ : dvote] } 
point = randO * vote_sum 
vote_sum, index = 0.0, 0 
pop.each_with_index do |c,i| 

vote_sum += c[:dvote] 

if vote_sum >= point 
index = i 
break 

end 
end 

if pop [index] [ :num] > 1 
pop [index] [ :num] -= 1 
else 

pop .delete_at (index) 
end 
end 

def generate_random_classif ier (input , actions, gen, rate=l . 0/3 . 0) 
condition = "" 

input . size .times {|i| condition « ((rand<rate) ? '#' : input [i] . chr) } 
action = actions [rand(actions . size) ] 
return new_classif ier (condition, action, gen) 
end 

def does_match? (input, condition) 
input . size .times do |i| 
return false if condition [i] . chr !='# ' and 
input [i] . chr ! =condition[i] .chr 

end 

return true 
end 
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def get_actions (pop) 
actions = [] 
pop . each do I c I 

actions << c[:action] if ! actions . include?(c [: action] ) 
end 

return actions 
end 

def generate_match_set (input, pop, all_actions, gen, pop_size) 
match_set = pop . select{ I c I does_match? (input , c [: condition] )} 
actions = get_actions (match_set) 
while actions. size < all_actions . size do 
remaining = all_actions - actions 

classifier = generate_random_classif ier(input, remaining, gen) 
pop << classifier 
match_set << classifier 
delete_f rom_pop(pop, pop_size) 
actions << classifier [: action] 
end 

return match_set 
end 

def generate_prediction(match_set) 
pred = {} 

match_set . each do I classifier I 
key = classifier [: action] 

pred [key] = { : sum=>0 . 0 , : count=>0 . 0 , : weight=>0 . 0} if pred [key] .nil? 
pred [key] [: sum] += classifier [ :pred] *classifier [: fitness] 
pred [key] [: count] += classifier [: fitness] 
end 

pred. keys . each do I key I 

pred [key] [: weight] = 0.0 

if pred [key] [: count] > 0 

pred [key] [: weight] = pred [key] [: sum] /pred [key] [: count] 

end 
end 

return pred 
end 

def select_action(predictions , p_explore=f alse) 
keys = Array .new(predictions . keys) 
return keys [rand(keys . size)] if p_explore 

keys.sort!{|x,y I predictions [y] [: weight] <=>predictions [x] [: weight]} 
return keys. first 
end 

def update_set(action_set, reward, beta=0.2) 

sum = action_set. inject (0.0) {|s,other| s+other [ :num] } 
action_set . each do |c| 
c [ : exp] +=1.0 
if c[:exp] < 1.0/beta 
c[: error] = 

(c [ : error] * (c [ : exp] -1 . 0)+ (reward-c [ :pred] ) . abs) / c [ : exp] 
c[:pred] = (c[:pred] * (c [ : exp] -1 . 0) + reward) / c[:exp] 
c[:setsize] = (c [ : setsize] * (c [ : exp] -1 . 0)+sum) / c[:exp] 



152 



Chapter 3. Evolutionary Algorithms 



else 

c[: error] += beta * ( (reward-c [ :pred] ). abs - c[: error]) 
c[:pred] += beta * (reward-c [ :pred] ) 
c[:setsize] += beta * (sum - c[:setsize]) 

end 
end 
end 

def update_fitness(action_set, min_error=10 , l_rate=0.2, alpha=0.1, 
v=-5.0) 
sum =0.0 

acc = Array .new(action_set . size) 
action_set . each_with_index do |c,i| 

acc [i] = (c [: error] <min_error) ? 1.0 : 
alpha* (c [ : error] /min_error) **v 

sum += acc[i] * c[:num].to_f 
end 

action_set . each_with_index do |c,i| 

c[:fitness] += l_rate * ((acc[i] * c [ :num] . to_f ) / sum - 
c [ : fitness] ) 

end 
end 

def can_run_genetic_algorithm(action_set , gen, ga_freq) 
return false if action_set . size <= 2 

total = action_set . inject (0.0) {|s,c| s+c [ : lasttime] *c [ :num] } 
sum = action_set . inject (0.0) {|s,c| s+c[:num]} 
return true if gen - (total/sum) > ga_freq 
return false 
end 

def binary_tournament (pop) 

i, j = rand(pop . size) , rand (pop . size) 
j = rand(pop . size) while j==i 

return (pop [i] [: fitness] > pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def mutational, action_set, input, rate=0.04) 
cl [: condition] . size .times do |i| 
if randO < rate 

cl [: condition] [i] = (cl [: condition] [i] . chr== '#' ) ? input [i] : '#' 
end 
end 

if randO < rate 

subset = action_set - [cl [: action] ] 

cl[:action] = subset [rand(subset . size)] 
end 
end 

def unif orm_crossover (parentl , parent2) 
child = "" 

parentl . size .times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 

return child 
end 
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def insert_in_pop(cla, pop) 
pop . each do I c I 

if cla [: condition] ==c [: condition] and cla[ : action] ==c [: action] 

c [ : nun] += 1 

return 
end 
end 

pop « cla 
end 

def crossover (cl , c2, pi, p2) 

cl [: condition] = unif orm_crossover (pi [: condition] , p2 [: condition] ) 
c2 [: condition] = unif orm_crossover (pi [: condition] , p2 [: condition] ) 
c2[:pred] = cl[:pred] = (pi [ : pred] +p2 [ :pred] ) /2 . 0 
c2[: error] = cl[: error] = 0 . 25* (pi [: error] +p2 [: error] ) /2 . 0 
c2[:fitness] = cl[:fitness] = 0 . 1* (pi [: fitness] +p2 [: fitness] ) /2 . 0 

end 

def run_ga(actions , pop, action_set, input, gen, pop_size, crate=0.8) 
pi, p2 = binary_tournament (action_set) , binary_tournament (action_set) 
cl, c2 = copy_classif ier (pi) , copy_classif ier (p2) 
crossover(cl, c2, pi, p2) if rand() < crate 
[cl, c2] . each do I c I 
mutation(c, actions, input) 
insert_in_pop(c, pop) 
end 

while pop. inject (0) {|s,c| s+c[:num]} > pop_size 

delete_f rom_pop(pop, pop_size) 
end 
end 

def train_model (pop_size , max_gens, actions, ga_freq) 
pop, perf = [] , [] 
max_gens .times do I gen I 

explore = gen.modulo(2)==0 

input = random_bitstring() 

match_set = generate_match_set (input , pop, actions, gen, pop_size) 
pred_array = generate_prediction(match_set) 
action = select_action(pred_array , explore) 

reward = (target_function(input)==action.to_i) ? 1000.0 : 0.0 
if explore 

action_set = match_set . select{ I c I c [: action] ==action} 
update_set(action_set, reward) 
update_f itness(action_set) 

if can_run_genetic_algorithm(action_set , gen, ga_freq) 
action_set . each {|c| c [ : lasttime] = gen} 
run_ga(actions, pop, action_set, input, gen, pop_size) 

end 
else 

e,a = (pred_array [action] [: weight] -reward) .abs, 
( (reward==1000 .0)71:0) 

perf << { : error=>e , : correct=>a} 

if perf. size >= 50 
err = (perf . inject (0){ I s ,x I s+x [: error] }/perf . size) . round 
acc = perf . inject (0 . 0) { I s ,x I s+x [: correct] }/perf . size 
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puts " >iter=#{gen+l} size=#{pop . size} , error=#{err} , acc=#{acc}" 
perf = [] 
end 
end 
end 

return pop 
end 

def test_model(system, num_trials=50) 
correct = 0 
num_trials . times do 

input = random_bitstring() 

match_set = system. select{ I c I does_match?(input , c [: condition] ) } 
pred_array = generate_prediction(match_set) 
action = select_action(pred_array , false) 
correct += 1 if target_f unction (input) == action. to_i 
end 

puts "Done! classified correctly=#{correct}/#{num_trials}" 
return correct 
end 

def execute (pop_size , max_gens, actions, ga_freq) 

system = train_model(pop_size, max_gens, actions, ga_freq) 
test_model (system) 
return system 

end 

if __FILE__ == $0 

# problem configuration 
all_actions = ['0' , '1'] 

# algorithm configuration 
max_gens, pop_size = 5000, 200 
ga_freq = 25 

# execute, the algorithm 

execute (pop_size , max_gens, all_actions, ga_freq) 
end 
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3.9.6 References 
Primary Sources 

Early ideas on the theory of Learning Classifier Systems were proposed 
by Holland [4, 7], culminating in a standardized presentation a few years 
later [5]. A number of implementations of the theoretical system were 
investigated, although a taxonomy of the two main streams was proposed 
by De Jong [9]: 1) Pittsburgh-style proposed by Smith [11, 12] and 
2) Holland-style or Michigan-style Learning classifiers that are further 
comprised of the Zeroth-level classifier (ZCS) [13] and the accuracy-based 
classifier (XCS) [14]. 
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Learn More 

Booker, Goldberg, and Holland provide a classical introduction to Learn- 
ing Classifier Systems including an overview of the state of the field 
and the algorithm in detail [1]. Wilson and Goldberg also provide an 
introduction and review of the approach, taking a more critical stance 
[16]. Holmes ct al. provide a contemporary review of the field focusing 
both on a description of the method and application areas to which 
the approach has been demonstrated successfully [8]. Lanzi, Stolzmann, 
and Wilson provide a seminal book in the field as a collection of papers 
covering the basics, advanced topics, and demonstration applications; 
a particular highlight from this book is the first section that provides 
a concise description of Learning Classifier Systems by many leaders 
and major contributors to the field [6], providing rare insight. Another 
paper from Lanzi and Riolo's book provides a detailed review of the 
development of the approach as it matured throughout the 1990s [10]. 
Bull and Kovacs provide a second book introductory book to the field 
focusing on the theory of the approach and its practical application [2] . 
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3.10 Non-dominated Sorting Genetic Algo- 
rithm 

Non-dominated Sorting Genetic Algorithm, Nondominated Sorting Ge- 
netic Algorithm, Fast Elitist Non-dominated Sorting Genetic Algorithm, 
NSGA, NSGA-II, NSGAII. 

3.10.1 Taxonomy 

The Non-dominated Sorting Genetic Algorithm is a Multiple Objective 
Optimization (MOO) algorithm and is an instance of an Evolutionary 
Algorithm from the field of Evolutionary Computation. Refer to Sec- 
tion 9.5.3 for more information and references on Multiple Objective 
Optimization. NSGA is an extension of the Genetic Algorithm for mul- 
tiple objective function optimization (Section 3.2). It is related to other 
Evolutionary Multiple Objective Optimization Algorithms (EMOO) (or 
Multiple Objective Evolutionary Algorithms MOEA) such as the Vector- 
Evaluated Genetic Algorithm (VEGA), Strength Pareto Evolutionary 
Algorithm (SPEA) (Section 3.11), and Pareto Archived Evolution Strat- 
egy (PAES). There are two versions of the algorithm, the classical NSGA 
and the updated and currently canonical form NSGA-II. 

3.10.2 Strategy 

The objective of the NSGA algorithm is to improve the adaptive fit 
of a population of candidate solutions to a Pareto front constrained 
by a set of objective functions. The algorithm uses an evolutionary 
process with surrogates for evolutionary operators including selection, 
genetic crossover, and genetic mutation. The population is sorted into a 
hierarchy of sub-populations based on the ordering of Pareto dominance. 
Similarity between members of each sub-group is evaluated on the 
Pareto front, and the resulting groups and similarity measures are used 
to promote a diverse front of non-dominated solutions. 

3.10.3 Procedure 

Algorithm 3.10.1 provides a pseudocode listing of the Non-dominated 
Sorting Genetic Algorithm II (NSGA-II) for minimizing a cost function. 
The SortByRankAndDistance function orders the population into a 
hierarchy of non-dominated Pareto fronts. The CrowdingDistance- 
Assignment calculates the average distance between members of each 
front on the front itself. Refer to Deb et al. for a clear presentation of 
the Pseudocode and explanation of these functions [4]. The Crossover- 
AndMutation function performs the classical crossover and mutation 
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genetic operators of the Genetic Algorithm. Both the SelectParentsBy- 
RankAndDistance and SortByRankAndDistance functions discriminate 
members of the population first by rank (order of dominated precedence 
of the front to which the solution belongs) and then distance within the 
front (calculated by CrowdingDistanceAssignment). 

3.10.4 Heuristics 

• NSGA was designed for and is suited to continuous function 
multiple objective optimization problem instances. 

• A binary representation can be used in conjunction with classical 
genetic operators such as one-point crossover and point mutation. 

• A real-valued representation is recommended for continuous func- 
tion optimization problems, in turn requiring representation spe- 
cific genetic operators such as Simulated Binary Crossover (SBX) 
and polynomial mutation [2]. 

3.10.5 Code Listing 

Listing 3.9 provides an example of the Non-dominated Sorting Ge- 
netic Algorithm II (NSGA-II) implemented in the Ruby Programming 
Language. The demonstration problem is an instance of continuous 
multiple objective function optimization called SCH (problem one in [4]). 
The problem seeks the minimum of two functions: fl = Y^i=i x f an< ^ 
/2 = J2?=i( x i ~ 2) 2 , —10 < Xi < 10 and n = 1. The optimal solution 
for this function are x G [0, 2]. The algorithm is an implementation of 
NSGA-II based on the presentation by Deb et al. [4]. The algorithm 
uses a binary string representation (16 bits per objective function pa- 
rameter) that is decoded and rescaled to the function domain. The 
implementation uses a uniform crossover operator and point mutations 
with a fixed mutation rate of ^, where L is the number of bits in a 
solution's binary string. 

def objectivel(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x**2.0)} 
end 

def objective2 (vector) 

return vector . inject (0 . 0) {|sum, x| sum + ( (x-2 . 0) **2 . 0)} 
end 

def decode (bitstring, search_space , bits_per_param) 
vector = [] 

search_space . each_with_index do I bounds, i| 
off, sum = i*bits_per_param, 0.0 

param = bitstring [of f ... (off +bits_per_param)] .reverse 
param. size .times do |j| 
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Algorithm 3.10.1: Pseudocode for NSGAII. 



Input: Population size , ProblemSize, P cr ossover, Pmutation 
Output: Children 

1 Population 4— InitializePopulation(PopuZafrion s i ze7 
ProblemSize); 

2 Evaluat eAgainstObjectiveFunct ions (Population) ; 

3 FastNondominatedSort (Population) ; 

4 Selected <— SelectParentsByRank(Population, Population size ~)\ 

5 Children <— CrossoverAndMutation(Selected, P cr ossover, 

^mutation ) : 

6 while ^StopConditionO do 
EvaluateAgainstObj ectiveFunctions (Children) ; 
Union <— Merge(Population, Children); 
Fronts 4— FastNondominatedSort (Union) ; 
Parents <- 0; 
Front L <- 0; 

foreach Fronti <E Fronts do 

CrowdingDistanceAssignment (FronU) ; 
if Size(Parents)+Size(Fronii) > Population s i ze then 
Frontr, i', 
Break (); 
else 

Parents <— Merge (Parents, Front{) ; 
end 
end 

if Size (Parents) <Population S i ze then 

FrontL -5— SortByRankAndDistance(Fronti); 

for Pi to Pp 0 pulation a i ze — Slze(.FrontL) 

do 

Parents <— Pz; 
end 
end 

Selected <— SelectParentsByRankAndDistance( Parents, 
Popu/ation size ); 
Population <— Children; 

Children <— CrossoverAndMutation(Selected ; P cr0 ssover, 

^mutation ) ) 

30 end 

31 return Children; 



sum += ( (param [ j ] . chr== 1 1 ' ) ? 1.0 : 0.0) * (2.0 ** j .to_f ) 
end 

min, max = bounds 

vector << min + ( (max-min) /( (2 . 0**bits_per_param. to_f ) -1 . 0) ) * sum 
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end 

return vector 
end 

def random_bitstring(num_bits) 

return (0. . .num.bits) . inject (" "H I s , i I s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def point_mutation(bitstring, rate=l . 0/bitstring. size) 
child = "" 

bitstring. size. times do |i| 
bit = bitstring [i] . chr 

child « (( rand ()<r ate) ? C(bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def crossover(parentl, parent2, rate) 
return ""+parentl if rand()>=rate 
child = "" 

parentl . size .times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 

return child 
end 

def reproduce (selected, pop_size, p_cross) 
children = [] 

selected. each_with_index do I pi, i| 
p2 = (i.modulo(2)==0) ? selected[i+l] : selected[i-l] 
p2 = selected [0] if i == selected . size-1 
child = {} 

child [: bitstring] = crossover(pl [:bitstring] , p2 [: bitstring] , 
p_cross) 

child [: bitstring] = point_mutation(child[:bitstring] ) 
children << child 

break if children. size >= pop_size 
end 

return children 
end 

def calculate_objectives(pop, search_space , bits_per_param) 
pop. each do |p| 

p[: vector] = decode (p[: bitstring] , search_space , bits_per_param) 
p[:objectives] = [objectivel(p[:vector] ) , obj ective2(p[: vector] )] 
end 
end 

def dominates (pi, p2) 

pi [: objectives] . each_index do |i| 

return false if pi [:objectives] [i] > p2 [: objectives] [i] 
end 

return true 
end 

def f ast_nondominated_sort (pop) 
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fronts = Array .new(l) {[] } 
pop. each do I pi I 

pi [ : dom_count] , pi [:dom_set] =0, [] 
pop . each do I p2 I 

if dominates (pi , p2) 
pi [:dom_set] « p2 
elsif dominates (p2, pi) 

pi [:dom_ count] += 1 
end 
end 

if pi [:dom_count] == 0 
pi [: rank] = 0 
fronts. first « pi 
end 
end 

curr = 0 
begin 

next_front = [] 
fronts [curr] . each do I pi I 
pi [ : dom_set] . each do |p2| 
p2 [ : dom_count] -= 1 
if p2 [ : dom_count] == 0 
p2[:rank] = (curr+1) 
next_front << p2 
end 
end 
end 

curr += 1 

fronts << next_front if !next_front . empty? 
end while curr < fronts. size 
return fronts 
end 

def calculate_crowding_distance(pop) 
pop. each {|p| p[:dist] = 0.0} 
num_obs = pop . first [: objectives] . size 
num_obs . times do |i| 

min = pop .min{ I x,y I x [: objectives] [i] <=>y [: objectives] [i] } 
max = pop .max{ I x ,y j x [: objectives] [i] <=>y [: objectives] [i] } 
rge = max [: objectives] [i] - min [: objectives] [i] 
pop.first[:dist] , pop. last [ :dist] = 1.0/0.0, 1.0/0.0 
next if rge == 0.0 
(1. . . (pop.size-1)) .each do |j| 

pop[j] [:dist]+=(pop[j+l] [objectives] [i]-pop[j-l] [: objectives] [i])/rge 
end 
end 
end 

def crowded_comparison_operator (x ,y) 

return y [ : dist] <=>x [ : dist] if x[:rank] == y[:rank] 

return x [ : rank] <=>y [ : rank] 
end 

def better (x,y) 

if ! x [ :dist] .nil? and x[:rank] == y[:rank] 
return (x [: dist] >y [: dist] ) ? x : y 
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end 

return (x [: rank] <y [: rank] ) ? x : y 
end 

def select_parents (fronts, pop_size) 

fronts. each {|f I calculate_crowding_distance (f ) } 
offspring, last_front = [] , 0 
fronts. each do I front I 

break if (of f spring. size+front . size) > pop_size 

front. each {|p| offspring « p} 

last_front += 1 
end 

if (remaining = pop_size-off spring. size) > 0 

fronts [last_front] .sort! {|x,y| crowded_comparison_operator (x,y)} 

offspring += fronts [last_front] [0 ... remaining] 
end 

return offspring 
end 

def weighted_sum(x) 

return x [: objectives] . inject (0 . 0) {|sum, x| sum+x} 
end 

def search(search_space, max_gens, pop_size, p_cross, bits_per_param=16) 
pop = Array .new(pop_size) do |i| 

{ : bitstring=>random_bitstring (search_space . size*bits_per_param) } 
end 

calculate_objectives(pop, search_space , bits_per_param) 
f ast_nondominated_sort (pop) 
selected = Array .new(pop_size) do 

better(pop[rand(pop_size)] , pop [rand(pop_size)] ) 
end 

children = reproduce(selected, pop_size, p_cross) 
calculate_objectives(children, search_space , bits_per_param) 
max_gens .times do I gen I 

union = pop + children 

fronts = fast_nondominated_sort (union) 

parents = select_parents (fronts , pop_size) 

selected = Array .new(pop_size) do 
better (parents [rand(pop_size)] , parents [rand(pop_size)] ) 

end 

pop = children 

children = reproduce (selected, pop_size, p_cross) 
calculate_objectives(children, search_space , bits_per_param) 
best = parents .sort K I x,y I weighted_sum(x) <=>weighted_sum(y) } . first 
best_s = " [x=#{best [: vector] }, obj s=#{best [ : obj ectives] . join( ' , 
')}]" 

puts " > gen=#{gen+l}, f ronts=#{f ronts . size} , best=#{best_s}" 
end 

union = pop + children 
fronts = f ast_nondominated_sort (union) 
parents = select_parents (fronts , pop_size) 
return parents 
end 

if __FILE__ == $0 
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185 # problem configuration 

186 problem_size = 1 

187 search_space = Array .new (problem_size) {|i| [-10, 10]} 

188 # algorithm configuration 

189 max_gens = 50 

190 pop_size = 100 

191 p_cross = 0.98 

192 # execute the algorithm 

193 pop = search(search_space , max_gens, pop_size, p_cross) 

194 puts "done ! " 

195 end 

Listing 3.9: NSGA-II in Ruby 



3.10.6 References 
Primary Sources 

Srinivas and Deb proposed the NSGA inspired by Goldberg's notion 
of a non-dominated sorting procedure [6] . Goldberg proposed a non- 
dominated sorting procedure in his book in considering the biases in 
the Pareto optimal solutions provided by VEGA [5]. Srinivas and Deb's 
NSGA used the sorting procedure as a ranking selection method, and 
a fitness sharing niching method to maintain stable sub-populations 
across the Pareto front. Deb et al. later extended NSGA to address 
three criticism of the approach: the 0(mN 3 ) time complexity, the lack 
of elitism, and the need for a sharing parameter for the fitness sharing 
niching method [3, 4], 

Learn More 

Deb provides in depth coverage of Evolutionary Multiple Objective 
Optimization algorithms in his book, including a detailed description of 
the NSGA in Chapter 5 [1]. 
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3.11 Strength Pareto Evolutionary Algorithm 

Strength Pareto Evolutionary Algorithm, SPEA, SPEA2. 

3.11.1 Taxonomy 

Strength Pareto Evolutionary Algorithm is a Multiple Objective Opti- 
mization (MOO) algorithm and an Evolutionary Algorithm from the 
field of Evolutionary Computation. It belongs to the field of Evolution- 
ary Multiple Objective (EMO) algorithms. Refer to Section 9.5.3 for 
more information and references on Multiple Objective Optimization. 
Strength Pareto Evolutionary Algorithm is an extension of the Genetic 
Algorithm for multiple objective optimization problems (Section 3.2). 
It is related to sibling Evolutionary Algorithms such as Non-dominated 
Sorting Genetic Algorithm (NSGA) (Section 3.10), Vector-Evaluated 
Genetic Algorithm (VEGA), and Pareto Archived Evolution Strategy 
(PAES). There are two versions of SPEA, the original SPEA algorithm 
and the extension SPEA2. Additional extensions include SPEA+ and 
iSPEA. 

3.11.2 Strategy 

The objective of the algorithm is to locate and and maintain a front of 
non-dominated solutions, ideally a set of Pareto optimal solutions. This 
is achieved by using an evolutionary process (with surrogate procedures 
for genetic recombination and mutation) to explore the search space, 
and a selection process that uses a combination of the degree to which a 
candidate solution is dominated (strength) and an estimation of density 
of the Pareto front as an assigned fitness. An archive of the non- 
dominated set is maintained separate from the population of candidate 
solutions used in the evolutionary process, providing a form of elitism. 

3.11.3 Procedure 

Algorithm 3.11.1 provides a pseudocode listing of the Strength Pareto 
Evolutionary Algorithm 2 (SPEA2) for minimizing a cost function. The 
CalculateRawFitness function calculates the raw fitness as the sum of 
the strength values of the solutions that dominate a given candidate, 
where strength is the number of solutions that a give solution dominate. 
The CandidateDensity function estimates the density of an area of the 
Pareto front as a \'? 2 wnere °~ k is the Euclidean distance of the objective 
values between a given solution the fcth nearest neighbor of the solution, 
and k is the square root of the size of the population and archive com- 
bined. The PopulateWithRemainingBest function iteratively fills the 
archive with the remaining candidate solutions in order of fitness. The 
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RemoveMostSimilar function truncates the archive population removing 
those members with the smallest o k values as calculated against the 
archive. The SelectParents function selects parents from a population 
using a Genetic Algorithm selection method such as binary tournament 
selection. The CrossoverAndMutation function performs the crossover 
and mutation genetic operators from the Genetic Algorithm. 

Algorithm 3.11.1: Pseudocode for SPEA2. 
Input: Population S i ze , Archive s i ze , ProblemSize, P cr ossover, 

Pmutation 

Output: Archive 

1 Population <— InitializePopulation(Popu/atzon size , 
ProblemSize); 

2 Archive <— 0; 

3 while ^StopConditionO do 

4 for Si <G Population do 

5 j Si 0 bj ec ti ves <— CalculateObjectives(S'j); 

6 end 

7 Union <— Population + Archive; 

8 for Si <G Union do 

9 Siraw CalculateRawFitness (Si, Union); 

10 Sidensity <— Calculat eSolut ionDensity (Si , Union); 

11 Sija ness i Si raw -\- Sidensity: 

12 end 

13 Archive <— GetNonDominated(Union) ; 

14 if Size (Archive) < Archive S i ze then 

15 PopulateWithRemainingBest (Union, Archive, 
Ar chive Slze )\ 

16 else if Size (Archive) > Archive S i ze then 

17 j RemoveMostSimilar (Archive, Archive size ); 
is end 

19 Selected <— SelectParents (Archive, Population S i ze ); 

20 Population <— CrossoverAndMutation (Selected, P cr ossover, 

Pmutation) 5 

21 end 

22 return GetNonDominatedArcW\\/e: 



3.11.4 Heuristics 

• SPEA was designed for and is suited to combinatorial and contin- 
uous function multiple objective optimization problem instances. 

• A binary representation can be used for continuous function opti- 
mization problems in conjunction with classical genetic operators 
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such as one-point crossover and point mutation. 

• A k value of 1 may be used for efficiency whilst still providing 
useful results. 

• The size of the archive is commonly smaller than the size of the 
population. 

• There is a lot of room for implementation optimization in density 
and Pareto dominance calculations. 

3.11.5 Code Listing 

Listing 3.10 provides an example of the Strength Pareto Evolution- 
ary Algorithm 2 (SPEA2) implemented in the Ruby Programming 
Language. The demonstration problem is an instance of continuous 
multiple objective function optimization called SCH (problem one in [1]). 
The problem seeks the minimum of two functions: fl = X)"=i x 1 ano - 
f2 = Y^=i( Xi — 2) 2 > ~10 < Xi < 10 and n = 1. The optimal solutions 
for this function are x £ [0, 2]. The algorithm is an implementation of 
SPEA2 based on the presentation by Zitzler, Laumanns, and Thiele [5]. 
The algorithm uses a binary string representation (16 bits per objective 
function parameter) that is decoded and rescaled to the function do- 
main. The implementation uses a uniform crossover operator and point 
mutations with a fixed mutation rate of j , where L is the number of 
bits in a solution's binary string. 

def objective!. (vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x**2.0)} 
end 

def objective2(vector) 

return vector . inject (0 . 0) {|sum, x| sum + ( (x-2 . 0) **2 . 0) } 
end 

def decode (bitstring, search_space , bits_per_param) 
vector = [] 

search_space . each_with_index do I bounds, i| 
off, sum = i*bits_per_param, 0.0 

param = bitstring [of f ... (of f+bits_per_param) ]. reverse 
param. size .times do |j| 

sum += ( (param [j ] .chr=='l') ? 1.0 : 0.0) * (2.0 ** j.to_f) 
end 

min, max = bounds 

vector << min + ( (max-min) / ( (2 . 0**bits_per_param. to_f ) -1 . 0) ) * sum 
end 

return vector 
end 

def point_mutation(bitstring, rate=l . 0/bitstring. size) 
child = "" 
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bitstring. size. times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ( (bit== ' 1 ' ) ? "0" : "1") : bit) 
end 

return child 
end 

def binary_tournament (pop) 

i, j = rand(pop.size) , rand (pop . size) 
j = rand(pop.size) while j==i 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def crossover(parentl, parent2, rate) 
return ""+parentl if rand()>=rate 
child = "" 

parentl . size .times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 

return child 
end 

def reproduce (selected, pop_size, p_cross) 
children = [] 

selected . each_with_index do I pi, i| 
p2 = (i .modulo (2) ==0) ? selected[i+l] : selected[i-l] 
p2 = selected [0] if i == selected . size-1 
child = {} 

child [: bitstring] = crossover (pi [: bitstring] , p2 [: bitstring] , 
p_cross) 

child [: bitstring] = point_mutation(child[:bitstring] ) 
children << child 

break if children. size >= pop_size 
end 

return children 
end 

def random_bitstring(num_bits) 

return (0. . .num_bits) . inject (" "H I s , i I s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def calculate_objectives(pop, search_space , bits_per_param) 
pop. each do |p| 

p[: vector] = decode (p[: bitstring] , search_space , bits_per_param) 
p[:objectives] = [] 

p[:objectives] « objectivel(p[:vector] ) 
p[:objectives] « objective2(p [: vector] ) 
end 
end 

def dominates? (pi , p2) 

pi [: objectives] . each_index do |i| 

return false if pi [objectives] [i] > p2 [: objectives] [i] 
end 

return true 
end 
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def weighted_sum(x) 

return x [: obj ectives] . inject (0 . 0) {|sum, x| sum+x} 
end 

def euclidean_distance(cl , c2) 
sum =0.0 

cl.each_index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Math. sqrt (sum) 
end 

def calculate_dominated(pop) 
pop . each do I pi I 

pl[:dom_set] = pop. select {|p2| pl!=p2 and dominates? (pi , p2) } 
end 
end 

def calculate_raw_f itness (pi , pop) 

return pop . inject (0 . 0) do I sum, p2| 

(dominates? (p2, pi)) ? sum + p2 [ : dom_set] . size . to_f : sum 

end 
end 

def calculate_density (pi , pop) 
pop. each do |p2| 

p2[:dist] = euclidean_distance(pl [: objectives] , p2 [: obj ectives] ) 
end 

list = pop.sort{|x,y| x [ : dist] <=>y [ : dist] } 
k = Math. sqrt (pop. size) .to_i 
return 1.0 / (list [k] [: dist] + 2.0) 
end 

def calculate_f itness (pop , archive, search_space , bits_per_param) 
calculate_obj ectives (pop, search_space , bits_per_param) 
union = archive + pop 
calculate_dominated (union) 
union. each do |p| 

p[:raw_f itness] = calculate_raw_f itness (p , union) 

p[:density] = calculate_density(p, union) 

p[:fitness] = p[:raw_f itness] + p[:density] 
end 
end 

def environmental_selection(pop , archive, archive_size) 
union = archive + pop 

environment = union. select {|p| p [ :f itness] <1 .0} 
if environment . size < archive_size 
union. sort ! { I x,y I x [:f itness] <=>y [: fitness] } 
union. each do |p| 

environment << p if p[:fitness] >= 1.0 
break if environment . size >= archive_size 
end 

elsif environment . size > archive_size 
begin 

k = Math. sqrt (environment . size) . to_i 
environment . each do I pi I 
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environment . each do I p2 I 
p2[:dist] = euclidean_distance (pi [objectives] , 
p2 [ : objectives] ) 

end 

list = environment . sort{ I x,y I x [ : dist] <=>y [ : dist] } 
pl[:density] = list [k] [: dist] 
end 

environment . sort ! { I x,y I x [: density] <=>y[ : density] } 
environment . shift 
end until environment . size <= archive_size 
end 

return environment 
end 

def search(search_space , max_gens, pop_size, archive_size , p_cross, 
bits_per_param=16) 
pop = Array .new(pop_size) do |i| 

{ :bitstring=>random_bitstring(search_space . size*bits_per_param) } 
end 

gen, archive = 0, [] 
begin 

calculate_f itness(pop, archive, search_space , bits_per_param) 
archive = environmental_selection(pop, archive, archive_size) 
best = archive . sort{ I x ,y I weighted_sum(x) <=>weighted_sum(y)} . first 
puts ">gen=#{gen} , objs=#{best [objectives] .join( ' , ')}" 
break if gen >= max_gens 

selected = Array. new(pop_size) {binary_tournament (archive) } 

pop = reproduce (selected, pop_size, p_cross) 

gen += 1 
end while true 
return archive 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 1 

search_space = Array .new(problem_size) {|i| [-10, 10]} 

# algorithm configuration 
max_gens = 50 

pop_size = 80 
archive_size = 40 
p_cross = 0.90 

# execute the algorithm 

pop = search(search_space, max_gens, pop_size, archive_size , p_cross) 
puts "done!" 
end 



Listing 3.10: SPEA2 in Ruby 



3.11.6 References 
Primary Sources 

Zitzler and Thiele introduced the Strength Pareto Evolutionary Al- 
gorithm as a technical report on a multiple objective optimization 
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algorithm with elitism and clustering along the Pareto front [6]. The 
technical report was later published [7]. The Strength Pareto Evolu- 
tionary Algorithm was developed as a part of Zitzler's PhD thesis [2]. 
Zitzler, Laumanns, and Thiele later extended SPEA to address some 
inefficiencies of the approach, the algorithm was called SPEA2 and was 
released as a technical report [4] and later published [5] . SPEA2 provides 
fine-grained fitness assignment, density estimation of the Pareto front, 
and an archive truncation operator. 

Learn More 

Zitzler, Laumanns, and Bleuler provide a tutorial on SPEA2 as a book 
chapter that considers the basics of multiple objective optimization, and 
the differences from SPEA and the other related Multiple Objective 
Evolutionary Algorithms [3]. 
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4.1 Overview 

This chapter describes Physical Algorithms. 

4.1.1 Physical Properties 

Physical algorithms are those algorithms inspired by a physical pro- 
cess. The described physical algorithm generally belong to the fields of 
Metaheustics and Computational Intelligence, although do not fit neatly 
into the existing categories of the biological inspired techniques (such 
as Swarm, Immune, Neural, and Evolution). In this vein, they could 
just as easily be referred to as nature inspired algorithms. 

The inspiring physical systems range from metallurgy, music, the 
interplay between culture and evolution, and complex dynamic systems 
such as avalanches. They are generally stochastic optimization algo- 
rithms with a mixtures of local (neighborhood-based) and global search 
techniques. 

4.1.2 Extensions 

There are many other algorithms and classes of algorithm that were not 
described inspired by natural systems, not limited to: 

• More Annealing: Extensions to the classical Simulated Anneal- 
ing algorithm, such as Adaptive Simulated Annealing (formally 
Very Fast Simulated Re-annealing) [3, 4], and Quantum Annealing 



• Stochastic tunneling: based on the physical idea of a particle 
tunneling through structures [5] . 



[1, 2]. 



173 



174 



Chapter 4. Physical Algorithms 



4.1.3 Bibliography 

[1] B. Apolloni, C. Caravalho, and D. De Falco. Quantum stochastic 
optimization. Stochastic Processes and their Applications, 33:233- 
244, 1989. 

[2] A. Das and B. K. Chakrabarti. Quantum annealing and related 
optimization methods. Springer, 2005. 

[3] L. Ingbcr. Very fast simulated re-annealing. Mathematical and 
Computer Modelling, 12(8):967-973, 1989. 

[4] L. Ingber. Adaptive simulated annealing (ASA): Lessons learned. 
Control and Cybernetics, 25(l):33-54, 1996. 

[5] W. Wenzel and K. Hamacher. A stochastic tunneling approach for 
global minimization of complex potential energy landscapes. Phys. 
Rev. Lett, 82(15):3003-3007, 1999. 



4.2. Simulated Annealing 



175 



4.2 Simulated Annealing 

Simulated Annealing, SA. 

4.2.1 Taxonomy 

Simulated Annealing is a global optimization algorithm that belongs 
to the field of Stochastic Optimization and Metaheuristics. Simulated 
Annealing is an adaptation of the Metropolis-Hastings Monte Carlo 
algorithm and is used in function optimization. Like the Genetic Algo- 
rithm (Section 3.2), it provides a basis for a large variety of extensions 
and specialization's of the general method not limited to Parallel Sim- 
ulated Annealing, Fast Simulated Annealing, and Adaptive Simulated 
Annealing. 

4.2.2 Inspiration 

Simulated Annealing is inspired by the process of annealing in metallurgy. 
In this natural process a material is heated and slowly cooled under 
controlled conditions to increase the size of the crystals in the material 
and reduce their defects. This has the effect of improving the strength 
and durability of the material. The heat increases the energy of the 
atoms allowing them to move freely, and the slow cooling schedule allows 
a new low-energy configuration to be discovered and exploited. 

4.2.3 Metaphor 

Each configuration of a solution in the search space represents a different 
internal energy of the system. Heating the system results in a relaxation 
of the acceptance criteria of the samples taken from the search space. 
As the system is cooled, the acceptance criteria of samples is narrowed 
to focus on improving movements. Once the system has cooled, the 
configuration will represent a sample at or close to a global optimum. 

4.2.4 Strategy 

The information processing objective of the technique is to locate the 
minimum cost configuration in the search space. The algorithms plan 
of action is to probabilistically re-sample the problem space where the 
acceptance of new samples into the currently held sample is managed 
by a probabilistic function that becomes more discerning of the cost 
of samples it accepts over the execution time of the algorithm. This 
probabilistic decision is based on the Metropolis-Hastings algorithm for 
simulating samples from a thermodynamic system. 
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4.2.5 Procedure 

Algorithm 4.2.1 provides a pseudocode listing of the main Simulated 
Annealing algorithm for minimizing a cost function. 

Algorithm 4.2.1: Pseudocode for Simulated Annealing. 
Input: ProblemSize, iterations max , temp max 
Output: S best 

1 Scurrent CreatelnitialSolution ( ProblemSize) ; 

2 Sbest ^ S 'current j 

3 for i = 1 to iterations max do 



4 
5 
6 
7 
8 
9 
10 



S*i <— CreateNeighborSolution(5 ctir . rerl () ; 
tempcurr CalculateTemperature(i, temp max ); 
if Cost (S^) < Cost {S curr ent) then 

Scurrent ^ Si > 

if Cost (Si) < Cost (S be;st ) then 

| Sbest 4 Si', 

end 

else if Exp( CoBt(Se " t ^;* ) u ~ Cost(Si) ) > RandO then 

| Scurrent ^ Si , 

end 



n 

12 
13 

14 end 

15 return S bes t 



4.2.6 Heuristics 

• Simulated Annealing was designed for use with combinatorial op- 
timization problems, although it has been adapted for continuous 
function optimization problems. 

• The convergence proof suggests that with a long enough cooling 
period, the system will always converge to the global optimum. 
The downside of this theoretical finding is that the number of 
samples taken for optimum convergence to occur on some problems 
may be more than a complete enumeration of the search space. 

• Performance improvements can be given with the selection of a 
candidate move generation scheme (neighborhood) that is less 
likely to generate candidates of significantly higher cost. 

• Restarting the cooling schedule using the best found solution so 
far can lead to an improved outcome on some problems. 

• A common acceptance method is to always accept improving solu- 
tions and accept worse solutions with a probability of P(accept) <— 
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exp(^^), where T is the current temperature, e is the energy (or 
cost) of the current solution and e' is the energy of a candidate 
solution being considered. 

• The size of the neighborhood considered in generating candidate 
solutions may also change over time or be influenced by the tem- 
perature, starting initially broad and narrowing with the execution 
of the algorithm. 

• A problem specific heuristic method can be used to provide the 
starting point for the search. 

4.2.7 Code Listing 

Listing 4.1 provides an example of the Simulated Annealing algorithm 
implemented in the Ruby Programming Language. The algorithm is 
applied to the Berlin52 instance of the Traveling Salesman Problem 
(TSP), taken from the TSPLIB. The problem seeks a permutation of 
the order to visit cities (called a tour) that minimizes the total distance 
traveled. The optimal tour distance for Berlin52 instance is 7542 units. 

The algorithm implementation uses a two-opt procedure for the 
neighborhood function and the classical P '(accept) «— exp(^=^) as the 
acceptance function. A simple linear cooling regime is used with a large 
initial temperature which is decreased each iteration. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array . new(cities . size) { I i I i} 
perm. each_index do |i| 

r = rand(perm. size-i) + i 

perm[r], perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def stochastic_two_opt ! (perm) 

cl, c2 = rand (perm. size) , rand(perm. size) 
exclude = [cl] 

exclude << ((cl==0) ? perm. size-1 : cl-1) 
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exclude << ( (cl==perm. size-1) ? 0 : cl+1) 
c2 = rand(perm. size) while exclude . include? (c2) 
cl, c2 = c2, cl if c2 < cl 
perm[cl . . . c2] = perm [cl ... c2] . reverse 
return perm 
end 

def create_neighbor (current , cities) 
candidate = {} 

candidate [: vector] = Array .new(current [: vector] ) 
stochastic_two_opt ! (candidate [: vector] ) 
candidate [: cost] = cost (candidate [: vector] , cities) 
return candidate 
end 

def should_accept? (candidate , current, temp) 

return true if candidate [: cost] <= current [: cost] 

return Math. exp( (current [: cost] - candidate [: cost] ) / temp) > randO 
end 

def search(cities , max_iter, max_temp, temp_change) 
current = {:vector=>random_permutation(cities)} 
current [: cost] = cost (current [: vector] , cities) 
temp, best = max_temp, current 
max_iter .times do I iter I 

candidate = create_neighbor (current , cities) 
temp = temp * temp_change 

current = candidate if should_accept? (candidate , current, temp) 
best = candidate if candidate [: cost] < best [: cost] 
if (iter+1) .modulo(lO) == 0 

puts " > iteration #{(iter+l)}, temp=#{temp} , best=#{best [ : cost] }" 
end 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 ,575] , [25 , 185] , [345,750] , [945 , 685] , [845 ,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iterations = 2000 
max_temp = 100000.0 
temp_change = 0 . 98 

# execute the algorithm 

best = search (berlin52 , max_iterations , max_temp, temp_change) 
puts "Done. Best Solution: c=#{best [ : cost] }, 
v=#{best [: vector] .inspect}" 

end 
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Listing 4.1: Simulated Annealing in Ruby 



4.2.8 References 
Primary Sources 

Simulated Annealing is credited to Kirkpatrick, Gelatt, and Vecchi 
in 1983 [5]. Granville, Krivanek, and Rasson provided the proof for 
convergence for Simulated Annealing in 1994 [2]. There were a number of 
early studies and application papers such as Kirkpatrick's investigation 
into the TSP and minimum cut problems [4], and a study by Vecchi 
and Kirkpatrick on Simulated Annealing applied to the global wiring 
problem [7]. 

Learn More 

There are many excellent reviews of Simulated Annealing, not limited to 
the review by Ingber that describes improved methods such as Adaptive 
Simulated Annealing, Simulated Quenching, and hybrid methods [3]. 
There are books dedicated to Simulated Annealing, applications and 
variations. Two examples of good texts include "Simulated Annealing: 
Theory and Applications" by Laarhoven and Aarts [6] that provides 
an introduction to the technique and applications, and "Simulated 
Annealing: Parallelization Techniques" by Robert Azencott [1] that 
focuses on the theory and applications of parallel methods for Simulated 
Annealing. 
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4.3 Extremal Optimization 

Extremal Optimization, EO. 

4.3.1 Taxonomy 

Extremal Optimization is a stochastic search technique that has the 
properties of being a local and global search method. It is generally 
related to hill-climbing algorithms and provides the basis for extensions 
such as Generalized Extremal Optimization. 

4.3.2 Inspiration 

Extremal Optimization is inspired by the Bak-Sncppen self-organized 
criticality model of co-evolution from the field of statistical physics. The 
self-organized criticality model suggests that some dynamical systems 
have a critical point as an attractor, whereby the systems exhibit pe- 
riods of slow movement or accumulation followed by short periods of 
avalanche or instability. Examples of such systems include land forma- 
tion, earthquakes, and the dynamics of sand piles. The Bak-Sncppen 
model considers these dynamics in co-evolutionary systems and in the 
punctuated equilibrium model, which is described as long periods of 
status followed by short periods of extinction and large evolutionary 
change. 

4.3.3 Metaphor 

The dynamics of the system result in the steady improvement of a 
candidate solution with sudden and large crashes in the quality of the 
candidate solution. These dynamics allow two main phases of activity 
in the system: 1) to exploit higher quality solutions in a local search like 
manner, and 2) escape possible local optima with a population crash 
and explore the search space for a new area of high quality solutions. 

4.3.4 Strategy 

The objective of the information processing strategy is to iteratively 
identify the worst performing components of a given solution and replace 
or swap them with other components. This is achieved through the 
allocation of cost to the components of the solution based on their 
contribution to the overall cost of the solution in the problem domain. 
Once components are assessed they can be ranked and the weaker 
components replaced or switched with a randomly selected component. 
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4.3.5 Procedure 

Algorithm 4.3.1 provides a pseudocode listing of the Extremal Opti- 
mization algorithm for minimizing a cost function. The deterministic 
selection of the worst component in the SelectWeakComponent function 
and replacement in the SelectReplacementComponent function is clas- 
sical EO. If these decisions are probabilistic making use of r parameter, 
this is referred to as r-Extremal Optimization. 

Algorithm 4.3.1: Pseudocode for Extremal Optimization. 
Input: ProblemSize, iterations max , t 
Output: S best 

1 S current <— CreatelnitialSolution ( ProblemSize) ; 

2 S oes t i 5 'current j 

3 for i = 1 to iterations m ax do 
foreach Componenti £ S curren t do 

I Componentf st <— Cost (Componenti, S curren t); 
end 

RankedComponents <— Ra.nk(Si cornponen ts) 
Componenti «— SelectWeakComponent (RankedComponents, 

Componenti, r); 
Component j <— 

SelectReplacementComponent (RankedComponents, r) ; 

Scandidate «- Replace (.Scurrent, Componenti, Component j) ; 
if Cost (S candidate) < Cost(S best ) then 

I S oes t i ^candidate 5 

end 

14 end 

15 return S best ; 



10 
11 
12 

13 



4.3.6 Heuristics 

• Extremal Optimization was designed for combinatorial optimiza- 
tion problems, although variations have been applied to continuous 
function optimization. 

• The selection of the worst component and the replacement compo- 
nent each iteration can be deterministic or probabilistic, the latter 
of which is referred to as r-Extremal Optimization given the use 
of a r parameter. 

• The selection of an appropriate scoring function of the components 
of a solution is the most difficult part in the application of the 
technique. 
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• For T-Extremal Optimization, low r values are used (such as 
t 6 [1.2, 1.6]) have been found to be effective for the TSP. 

4.3.7 Code Listing 

Listing 4.2 provides an example of the Extremal Optimization algorithm 
implemented in the Ruby Programming Language. The algorithm is 
applied to the Berlin52 instance of the Traveling Salesman Problem 
(TSP), taken from the TSPLIB. The problem seeks a permutation of 
the order to visit cities (called a tour) that minimizes the total distance 
traveled. The optimal tour distance for Berlin52 instance is 7542 units. 

The algorithm implementation is based on the seminal work by 
Boettcher and Percus [5]. A solution is comprised of a permutation of 
city components. Each city can potentially form a connection to any 
other city, and the connections to other cities ordered by distance may 
be considered its neighborhood. For a given candidate solution, the city 
components of a solution are scored based on the neighborhood rank 
of the cities to which they are connected: fitnessk <— r .+ r . , where 
and rj are the neighborhood ranks of cities i and j against city k. A 
city is selected for modification probabilistically where the probability 
of selecting a given city is proportional to n~ T , where n is the rank of 
city i. The longest connection is broken, and the city is connected with 
another neighboring city that is also probabilistically selected. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array . new(cities . size) { I i I i} 
perm.each_index do |i| 

r = rand(perm. size-i) + i 

perm[r], perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def calculate_neighbor_rank(city_number , cities, ignore=[]) 
neighbors = [] 

cities . each_with_index do I city, i| 
next if i==city_number or ignore . include? (i) 
neighbor = { :number=>i} 
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neighbor [: distance] = euc_2d(cities [city_number] , city) 
neighbors << neighbor 
end 

return neighbors . sort !{ I x,y I x[:distance] <=> y [: distance] } 
end 

def get_edges_f or_city(city_number , permutation) 
cl, c2 = nil, nil 

permutation. each_with_index do |c, i| 
if c == city_number 

cl = (i==0) ? permutation. last : permutation [i-1] 

c2 = (i==permutation. size-1) ? permutation. first : permutation [i+1] 
break 
end 
end 

return [cl, c2] 
end 

def calculate_city_f itness (permutation, city_number, cities) 
cl, c2 = get_edges_f or_city (city_number , permutation) 
neighbors = calculate_neighbor_rank(city_number , cities) 
nl, n2 = -1, -1 

neighbors . each_with_index do I neighbor.il 

nl = i+1 if neighbor [: number] == cl 

n2 = i+1 if neighbor [: number] == c2 

break if nl!=-l and n2!=-l 
end 

return 3.0 / (nl.to_f + n2.to_f) 
end 

def calculate_city_f itnesses(cities, permutation) 
city_f itnesses = [] 
cities . each_with_index do I city, i| 
city_fitness = {:number=>i} 

city_f itness [: fitness] = calculate_city_f itness (permutation, i, 
cities) 

city_f itnesses << city_fitness 
end 

return city_f itnesses . sort !{ I x ,y I y[:fitness] <=> x [: fitness] } 
end 

def calculate_component_probabilities (ordered_components , tau) 
sum =0.0 

ordered_components . each_with_index do I component , i| 

component [:prob] = (i+1 . 0) ** (-tau) 

sum += component [:prob] 
end 

return sum 
end 

def make_selection(components , sum_probability) 
selection = rand() 

components . each_with_index do I component, i| 

selection -= (component [ :prob] / sum_probability) 

return component [: number] if selection <= 0.0 
end 
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return components . last [: number] 
end 

def probabilistic_selection(ordered_components , tau, exclude=[]) 
sum = calculate_component_probabilities(ordered_components, tau) 
selected_city = nil 
begin 

selected_city = make_selection(ordered_components , sum) 
end while exclude . include? (selected_city) 
return selected_city 
end 

def vary_permutation(permutation, selected, new, long_edge) 
perm = Array .new (permutation) 

cl, c2 = perm. rindex (selected) , perm. r index (new) 
pl,p2 = (cl<c2) ? [cl,c2] : [c2,cl] 
right = (cl==perm. size-1) ? 0 : cl+1 
if perm [right] == long_edge 

perm[pl+l . .p2] = perm [pl+1 . .p2] . reverse 
else 

perm [pi . . .p2] = perm [pi .. .p2] .reverse 
end 

return perm 
end 

def get_long_edge (edges , neighbor_distances) 

nl = neighbor_distances.find {|x| x [ :number] ==edges [0] } 
n2 = neighbor_distances . f ind {|x| x [ :number] ==edges [1] } 
return (nl [: distance] > n2 [: distance] ) ? nl[: number] : n2[: number] 

end 

def create_new_perm(cities , tau, perm) 

city_f itnesses = calculate_city_f itnesses (cities , perm) 
selected_city = probabilistic_selection(city_f itnesses . reverse , tau) 
edges = get_edges_f or_city (selected_city , perm) 
neighbors = calculate_neighbor_rank(selected_city , cities) 
new_neighbor = probabilistic_selection (neighbors , tau, edges) 
long_edge = get_long_edge (edges , neighbors) 

return vary .permutation (perm, selected_city , new_neighbor , long_edge) 
end 

def search(cities , max_ iterations , tau) 

current = {:vector=>random_permutation(cities)} 
current [: cost] = cost (current [: vector] , cities) 
best = current 

max_iterations .times do I iter I 
candidate = {} 

candidate [: vector] = create_new_perm(cities , tau, current [: vector] ) 
candidate [: cost] = cost (candidate [: vector] , cities) 
current = candidate 

best = candidate if candidate [: cost] < best [: cost] 
puts " > iter #{(iter+l) } , curr=#{current [ : cost] } , 
best=#{best[:cost]}" 

end 

return best 
end 
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if __file__ == $o 

# problem configuration 

berlin52 = [ [565 ,575] , [25 , 185] , [345 ,750] , [945 ,685] , [845 ,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] , 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] , 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_iterations = 250 

tau =1.8 

# execute the algorithm 

best = search(berlin52 , max_iterations , tau) 
puts "Done. Best Solution: c=#{best [ : cost] } , 
v=#{best [: vector] .inspect}" 

end 



Listing 4.2: Extremal Optimization in Ruby 



4.3.8 References 
Primary Sources 

Extremal Optimization was proposed as an optimization heuristic by 
Boettcher and Percus applied to graph partitioning and the Traveling 
Salesman Problem [5] . The approach was inspired by the Bak-Sneppen 
self-organized criticality model of co-evolution [1, 2]. 

Learn More 

A number of detailed reviews of Extremal Optimization have been 
presented, including a review and studies by Boettcher and Percus [4], 
an accessible review by Boettcher [3], and a focused study on the Spin 
Glass problem by Boettcher and Percus [6]. 
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4.4 Harmony Search 

Harmony Search, HS. 

4.4.1 Taxonomy 

Harmony Search belongs to the fields of Computational Intelligence and 
Metaheuristics. 

4.4.2 Inspiration 

Harmony Search was inspired by the improvisation of Jazz musicians. 
Specifically, the process by which the musicians (who may have never 
played together before) rapidly refine their individual improvisation 
through variation resulting in an aesthetic harmony. 

4.4.3 Metaphor 

Each musician corresponds to an attribute in a candidate solution from 
a problem domain, and each instrument's pitch and range corresponds 
to the bounds and constraints on the decision variable. The harmony 
between the musicians is taken as a complete candidate solution at a 
given time, and the audiences aesthetic appreciation of the harmony 
represent the problem specific cost function. The musicians seek harmony 
over time through small variations and improvisations, which results in 
an improvement against the cost function. 

4.4.4 Strategy 

The information processing objective of the technique is to use good can- 
didate solutions already discovered to influence the creation of new can- 
didate solutions toward locating the problems optima. This is achieved 
by stochastically creating candidate solutions in a step-wise manner, 
where each component is either drawn randomly from a memory of high- 
quality solutions, adjusted from the memory of high-quality solutions, 
or assigned randomly within the bounds of the problem. The memory of 
candidate solutions is initially random, and a greedy acceptance criteria 
is used to admit new candidate solutions only if they have an improved 
objective value, replacing an existing member. 

4.4.5 Procedure 

Algorithm 4.4.1 provides a pseudocode listing of the Harmony Search 
algorithm for minimizing a cost function. The adjustment of a pitch 
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selected from the harmony memory is typically linear, for example for 
continuous function optimization: 

x' <— x + range x e (4.1) 

where range is a the user parameter (pitch bandwidth) to control the 
size of the changes, and e is a uniformly random number e [—1, 1]. 



Algorithm 4.4.1: Pseudocode for Harmony Search. 
Input: Pitch num , Pitchb OU nds, Memory S i ze , Consolidation 

Pitch Adjust rate , Improvisation max 
Output: Harmonyb es t 
l Harmonies <— InitializeHarmonyMemory (Pitch num , 
Pitch hounds , Memory slze ); 
EvaluateHarmonies (Harmonies) ; 
for i to Improvisation max do 
Harmony 0; 

foreach Pitchi e Pitch num do 

if RandO < Consolidation rate then 
RandomHarmony l pitch <— 

SelectRandomHarmonyPitch( Harmonies, Pitchi) ', 
if RandO < Pitch Adjust rate then 
Harmony pitch <- 

kdjustPitcii(RandomHarmony pitch ) ; 
else 

Harmony pitch <- RandomHarmony pltch ; 
end 
else 

Harmony pitch <- RandomPitch(Pitc/i fcoun(is ) ; 
end 
end 

EvaluateHarmonies (Harmony) ; 

if Cost (Harmony) < Cost (Worst (Harmonies) ) then 

i Worst (Harmonies) <— Harmony; 
end 

21 end 

22 return Harmonyb es u 



4.4.6 Heuristics 

• Harmony Search was designed as a generalized optimization method 
for continuous, discrete, and constrained optimization and has 
been applied to numerous types of optimization problems. 
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• The harmony memory considering rate (HMCR) € [0,1] controls 
the use of information from the harmony memory or the generation 
of a random pitch. As such, it controls the rate of convergence of 
the algorithm and is typically configured € [0.7,0.95]. 

• The pitch adjustment rate (PAR) e [0, 1] controls the frequency 
of adjustment of pitches selected from harmony memory, typically 
configured € [0.1,0.5]. High values can result in the premature 
convergence of the search. 

• The pitch adjustment rate and the adjustment method (amount 
of adjustment or fret width) are typically fixed, having a linear 
effect through time. Non-linear methods have been considered, for 
example refer to Geem [4]. 

• When creating a new harmony, aggregations of pitches can be 
taken from across musicians in the harmony memory. 

• The harmony memory update is typically a greedy process, al- 
though other considerations such as diversity may be used where 
the most similar harmony is replaced. 

4.4.7 Code Listing 

Listing 4.3 provides an example of the Harmony Search algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization that seeks 
minf(x) where / = Yl7=i x l > ~5.0 < Xi < 5.0 and n = 3. The optimal 
solution for this basin function is («o, . . . , v n -i) — 0.0. The algorithm 
implementation and parameterization are based on the description by 
Yang [7] , with refinement from Geem [4] . 

def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def rand_in_bounds(min, max) 

return min + ((max-min) * randO) 
end 

def random_vector (search_space) 

return Array .new(search_space . size) do |i| 

rand_in_bounds (search_space [i] [0] , search_space [i] [1]) 

end 
end 

def create_random_harmony (search_space) 
harmony = O 

harmony [: vector] = random_vector (search_space) 

harmony [: fitness] = objective_function(harmony [: vector] ) 
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return harmony 
end 

def initialize_harmony_memory(search_space, mem_size , factor=3) 
memory = 

Array .new (mem_size*f actor ){create_random_harmony (sear ch_space)} 
memory .sort ! { |x,y I x [:f itness] <=>y [: fitness] } 
return memory. first (mem_size) 
end 

def create_harmony(search_space, memory, consid_rate, adjust_rate, 
range) 

vector = Array. new (search_space. size) 
search_space . size .times do |i| 
if randO < consid_rate 

value = memory [rand (memory . size)] [: vector] [i] 
value = value + range*rand_in_bounds (-1 . 0 , 1.0) if 

randO <ad just _r ate 
value = search_space [i] [0] if value < search_space [i] [0] 
value = search_space [i] [1] if value > search_space [i] [1] 
vector [i] = value 
else 

vector[i] = rand_in_bounds(search_space [i] [0] , search_space [i] [1] ) 
end 
end 

return { : vector=>vector} 
end 

def search (bounds , max_iter, mem_size, consid_rate, adjust_rate, range) 
memory = initialize_harmony_memory(bounds, mem_size) 
best = memory. first 
max_iter .times do literl 

harm = create_harmony (bounds , memory, consid_rate, adjust_rate, 
range) 

harm [: fitness] = objective_function(harm[:vector] ) 
best = harm if harm [: fitness] < best [: fitness] 
memory << harm 

memory . sort ! { I x,y I x [ : fitness] <=>y [ : fitness] } 
memory . delete_at (memory . size-1) 

puts " > iteration=#{iter}, f itness=#{best [: fitness] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

bounds = Array .new(problem_size) {|i| [-5, 5]} 

# algorithm configuration 
mem_size = 20 
consid_rate = 0.95 
adjust_rate = 0.7 

range = 0.05 
max_iter = 500 

# execute the algorithm 

best = search (bounds , max_iter, mem_size, consid_rate, adjust_rate, 
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range) 

puts "done! Solution: f =#{best [: fitness] } , s=#{best [: vector] . inspect}" 
end 

Listing 4.3: Harmony Search in Ruby 



4.4.8 References 
Primary Sources 

Geem et al. proposed the Harmony Search algorithm in 2001, which 
was applied to a range of optimization problems including a constraint 
optimization, the Traveling Salesman problem, and the design of a water 
supply network [6]. 



Learn More 



A book on Harmony Search, edited by Geem provides a collection of 
papers on the technique and its applications [2], chapter 1 provides 
a useful summary of the method heuristics for its configuration [7]. 
Similarly a second edited volume by Geem focuses on studies that 
provide more advanced applications of the approach [5], and chapter 
1 provides a detailed walkthrough of the technique itself [4]. Geem 
also provides a treatment of Harmony Search applied to the optimal 
design of water distribution networks [3] and edits yet a third volume on 
papers related to the application of the technique to structural design 
optimization problems [1] . 
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4.5 Cultural Algorithm 

Cultural Algorithm, CA. 

4.5.1 Taxonomy 

The Cultural Algorithm is an extension to the field of Evolutionary 
Computation and may be considered a Meta-Evolutionary Algorithm. 
It more broadly belongs to the field of Computational Intelligence 
and Metaheuristics. It is related to other high-order extensions of 
Evolutionary Computation such as the Memetic Algorithm (Section 4.6). 

4.5.2 Inspiration 

The Cultural Algorithm is inspired by the principle of cultural evolution. 
Culture includes the habits, knowledge, beliefs, customs, and morals 
of a member of society. Culture does not exist independent of the 
environment, and can interact with the environment via positive or 
negative feedback cycles. The study of the interaction of culture in the 
environment is referred to as Cultural Ecology. 

4.5.3 Metaphor 

The Cultural Algorithm may be explained in the context of the inspiring 
system. As the evolutionary process unfolds, individuals accumulate 
information about the world which is communicated to other individuals 
in the population. Collectively this corpus of information is a knowledge 
base that members of the population may tap-into and exploit. Positive 
feedback mechanisms can occur where cultural knowledge indicates 
useful areas of the environment, information which is passed down 
between generations, exploited, refined, and adapted as situations change. 
Additionally, areas of potential hazard may also be communicated 
through the cultural knowledge base. 

4.5.4 Strategy 

The information processing objective of the algorithm is to improve the 
learning or convergence of an embedded search technique (typically an 
evolutionary algorithm) using a higher-order cultural evolution. The 
algorithm operates at two levels: a population level and a cultural level. 
The population level is like an evolutionary search, where individuals rep- 
resent candidate solutions, are mostly distinct and their characteristics 
are translated into an objective or cost function in the problem domain. 
The second level is the knowledge or believe space where information 
acquired by generations is stored, and which is accessible to the current 
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generation. A communication protocol is used to allow the two spaces 
to interact and the types of information that can be exchanged. 



4.5.5 Procedure 

The focus of the algorithm is the KnowledgeBase data structure that 
records different knowledge types based on the nature of the problem. 
For example, the structure may be used to record the best candidate 
solution found as well as generalized information about areas of the 
search space that are expected to payoff (result in good candidate 
solutions). This cultural knowledge is discovered by the population- 
based evolutionary search, and is in turn used to influence subsequent 
generations. The acceptance function constrain the communication of 
knowledge from the population to the knowledge base. 

Algorithm 4.5.1 provides a pseudocode listing of the Cultural Al- 
gorithm. The algorithm is abstract, providing flexibility in the inter- 
pretation of the processes such as the acceptance of information, the 
structure of the knowledge base, and the specific embedded evolutionary 
algorithm. 



Algorithm 4.5.1: Pseudocode for the Cultural Algorithm. 
Input: Problem s i ze , Population num 
Output: KnowledgeBase 

1 Population 4— InitializePopulation(Pro6/em s i ze; 
Populationnum) ; 

2 KnowledgeBase <— InitializeKnowledgebase (Problem s i, 
Population num ) ; 

3 while ^StopConditionO do 

4 Evaluate(Population) ; 

5 SituationalKnowledge can didate 
AcceptSituationalKnowledge (Population) ; 
UpdateSituationalKnowledge (KnowledgeBase, 

SituationalKnowledge can didate')] 
Children <— ReproduceWithlnf luence(Population, 
KnowledgeBase) ; 

Population <— Select (Children, Population); 

N ormativeKnowledge can didate <- 
AcceptNormativeKnowledge (Population) ; 
UpdateNormativeKnowledge (KnowledgeBase, 

N ormativeKnowledge candl date) ] 
n end 

12 return KnowledgeBase; 
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4.5.6 Heuristics 

• The Cultural Algorithm was initially used as a simulation tool 
to investigate Cultural Ecology. It has been adapted for use 
as an optimization algorithm for a wide variety of domains not- 
limited to constraint optimization, combinatorial optimization, 
and continuous function optimization. 

• The knowledge base structure provides a mechanism for incor- 
porating problem-specific information into the execution of an 
evolutionary search. 

• The acceptance functions that control the flow of information 
into the knowledge base are typically greedy, only including the 
best information from the current generation, and not replacing 
existing knowledge unless it is an improvement. 

• Acceptance functions are traditionally deterministic, although 
probabilistic and fuzzy acceptance functions have been investi- 
gated. 



4.5.7 Code Listing 

Listing 4.4 provides an example of the Cultural Algorithm implemented 
in the Ruby Programming Language. The demonstration problem is 
an instance of a continuous function optimization that seeks min/(a;) 
where / = x i> — 5.0 < Xi < 5.0 and n — 2. The optimal solution 

for this basin function is (vq, . . . , v n -i) = 0.0. 

The Cultural Algorithm was implemented based on the description 
of the Cultural Algorithm Evolutionary Program (CAEP) presented 
by Reynolds [4], A real- valued Genetic Algorithm was used as the 
embedded evolutionary algorithm. The overall best solution is taken 
as the 'situational' cultural knowledge, whereas the bounds of the top 
20% of the best solutions each generation are taken as the 'normative' 
cultural knowledge. The situational knowledge is returned as the result 
of the search, whereas the normative knowledge is used to influence 
the evolutionary process. Specifically, vector bounds in the normative 
knowledge are used to define a subspace from which new candidate 
solutions are uniformly sampled during the reproduction step of the evo- 
lutionary algorithm's variation mechanism. A real- valued representation 
and a binary tournament selection strategy are used by the evolutionary 
algorithm. 

def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def rand_in_bounds(min, max) 
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return min + ( (max-min) * randO) 
end 

def random_vector (minmax) 

return Array .new(minmax . size) do |i| 

rand_in_bounds (minmax [i] [0] , minmax [i] [1]) 

end 
end 

def mutate_with_inf (candidate , beliefs, minmax) 
v = Array. new (candidate [: vector] .size) 
candidate [: vector] . each_with_index do |c,i| 

v [i] =rand_in_bounds (belief s [:normative] [i] [0] .belief s [:normative] [i] 

v[i] = minmax [i] [0] if v[i] < minmax [i] [0] 

v[i] = minmax [i] [1] if v[i] > minmax [i] [1] 
end 

return { : vector=>v} 
end 

def binary_tournament (pop) 

i, j = rand(pop . size) , rand (pop. size) 
j = rand(pop.size) while j==i 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def initialize_belief space (search_space) 
belief_space = {} 
belief _space [: situational] = nil 

belief _space [ :normative] = Array .new(search_space . size) do |i| 

Array. new(search_space [i] ) 
end 

return belief_space 
end 

def update_belief space_situational ! (belief _space , best) 
curr_best = belief _space [: situational] 

if curr_best .nil? or best [:f itness] < curr_best [: fitness] 

belief_space[: situational] = best 
end 
end 

def update_belief space_normative ! (belief _space , acc) 
belief _space [ :normative] . each_with_index do |bounds,i| 

bounds[0] = acc .min{ I x ,y I x [: vector] [i] <=>y [: vector] [i] }[: vector] [i] 
bounds[l] = acc .max{ I x,y I x [: vector] [i] <=>y [: vector] [i] }[: vector] [i] 
end 
end 

def search (max_gens, search_space , pop_size, num_accepted) 

# initialize 

pop = Array .new(pop_size) { { : vector=>random_vector (search_space)} } 
belief _space = initialize_belief space (search_space) 

# evaluate 

pop.each{|c| c[:fitness] = objective_f unction(c [: vector] ) } 
best = pop . sort{ I x,y I x[:fitness] <=> y [: fitness] }. first 

# update situational knowledge 
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update_belief space_situational ! (belief _space , best) 
max_gens .times do I gen I 

# create next generation 

children = Array .new(pop_size) do |i| 

mutate_with_inf (pop [i] , belief _space , search_space) 
end 

# evaluate 

children. each{ I c I c[:fitness] = objective_f unction(c [: vector] )} 
best = children. sort{ I x,y I x[:fitness] <=> y [: fitness] }. first 

# update situational knowledge 

update_belief space_situational ! (belief _space , best) 

# select next generation 

pop = Array. new (pop_size) { binary_tournament (children + pop) } 

# update normative knowledge 

pop . sort ! { I x,y I x[:fitness] <=> y[:fitness]} 
acccepted = pop[0. . .num_accepted] 

update_belief space_normative ! (belief _space , acccepted) 

# user feedback 

puts " > generation=#{gen} , 

f =#{belief _space [ : situational] [ : fitness] }" 

end 

return belief _space [ : situational] 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array. new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_gens = 200 
pop_size = 100 

num_accepted = (pop_size*0 . 20) . round 

# execute the algorithm 

best = search (max_gens , search_space , pop_size, num_accepted) 
puts "done! Solution: f =#{best [: fitness] } , s=#{best [: vector] . inspect}" 
end 

Listing 4.4: Cultural Algorithm in Ruby 



4.5.8 References 
Primary Sources 

The Cultural Algorithm was proposed by Reynolds in 1994 that com- 
bined the method with the Version Space Algorithm (a binary string 
based Genetic Algorithm), where generalizations of individual solutions 
were communicated as cultural knowledge in the form of schema patterns 
(strings of l's, O's and #'s, where represents a wildcard) [3]. 

Learn More 

Chung and Reynolds provide a study of the Cultural Algorithm on a 
testbed of constraint satisfaction problems [1]. Reynolds provides a 
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detailed overview of the history of the technique as a book chapter that 
presents the state of the art and summaries of application areas including 
concept learning and continuous function optimization [4] . Coello Coello 
and Becerra proposed a variation of the Cultural Algorithm that uses 
Evolutionary Programming as the embedded weak search method, for 
use with Multi-Objective Optimization problems [2]. 
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4.6 Memetic Algorithm 

Memetic Algorithm, MA. 

4.6.1 Taxonomy 

Memetic Algorithms have elements of Metaheuristics and Computational 
Intelligence. Although they have principles of Evolutionary Algorithms, 
they may not strictly be considered an Evolutionary Technique. Memetic 
Algorithms have functional similarities to Baldwinian Evolutionary 
Algorithms, Lamarckian Evolutionary Algorithms, Hybrid Evolutionary 
Algorithms, and Cultural Algorithms (Section 4.5). Using ideas of 
memes and Memetic Algorithms in optimization may be referred to as 
Memetic Computing. 

4.6.2 Inspiration 

Memetic Algorithms are inspired by the interplay of genetic evolution 
and memetic evolution. Universal Darwinism is the generalization of 
genes beyond biological-based systems to any system where discrete 
units of information can be inherited and be subjected to evolutionary 
forces of selection and variation. The term 'meme' is used to refer to 
a piece of discrete cultural information, suggesting at the interplay of 
genetic and cultural evolution. 

4.6.3 Metaphor 

The genotype is evolved based on the interaction the phenotype has with 
the environment. This interaction is metered by cultural phenomena 
that influence the selection mechanisms, and even the pairing and 
recombination mechanisms. Cultural information is shared between 
individuals, spreading through the population as memes relative to their 
fitness or fitness the memes impart to the individuals. Collectively, the 
interplay of the geneotype and the memeotype strengthen the fitness of 
population in the environment. 

4.6.4 Strategy 

The objective of the information processing strategy is to exploit a 
population based global search technique to broadly locate good ar- 
eas of the search space, combined with the repeated usage of a local 
search heuristic by individual solutions to locate local optimum. Ideally, 
memetic algorithms embrace the duality of genetic and cultural evolu- 
tion, allowing the transmission, selection, inheritance, and variation of 
memes as well as genes. 
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4.6.5 Procedure 

Algorithm 4.6.1 provides a pseudocode listing of the Memetic Algorithm 
for minimizing a cost function. The procedure describes a simple or first 
order Memetic Algorithm that shows the improvement of individual 
solutions separate from a global search, although does not show the 
independent evolution of memes. 

Algorithm 4.6.1: Pseudocode for the Memetic Algorithm. 
Input: ProblemSize, Pop S i ze , MemePop S i ze 
Output: S best 

1 Population <— InitializePopulation(ProblemSize, Pop s i ze ); 

2 while ^StopConditionO do 

3 foreach Si G Population do 

4 | Si cost <- Cost 

5 end 

6 Sbest <— GetBestSolution(Population) ; 

7 Population <— StochasticGlobalSearch(Population) ; 

8 MemeticPopulation <— SelectMemeticPopulation(Population, 
MemePop S i ze ) ; 

9 foreach Si G MemeticPopulation do 
10 | Si <— LocalSearchCSi) ; 

n end 

12 end 

13 return S best ; 



4.6.6 Heuristics 

• The global search provides the broad exploration mechanism, 
whereas the individual solution improvement via local search pro- 
vides an exploitation mechanism. 

• Balance is needed between the local and global mechanisms to en- 
sure the system does not prematurely converge to a local optimum 
and does not consume unnecessary computational resources. 

• The local search should be problem and representation specific, 
where as the global search may be generic and non-specific (black- 
box). 

• Memetic Algorithms have been applied to a range of constraint, 
combinatorial, and continuous function optimization problem do- 
mains. 
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4.6.7 Code Listing 

Listing 4.5 provides an example of the Memetic Algorithm implemented 
in the Ruby Programming Language. The demonstration problem is 
an instance of a continuous function optimization that seeks min /(a:) 
where / = X)"=i x f > — 5.0 < Xi < 5.0 and n = 3. The optimal solution 
for this basin function is (v 0 , . . . , = 0.0. The Memetic Algorithm 

uses a canonical Genetic Algorithm as the global search technique that 
operates on binary strings, uses tournament selection, point mutations, 
uniform crossover and a binary coded decimal decoding of bits to real 
values. A bit climber local search is used that performs probabilistic 
bit flips (point mutations) and only accepts solutions with the same or 
improving fitness. 

def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_bitstring(num_bits) 

return (0 . . .num_bits) . inject (" " ) { I s , i I s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def decode (bitstring, search_space , bits_per_param) 
vector = [] 

search_space . each_with_index do I bounds, i| 
off, sum = i*bits_per_param, 0.0 

param = bitstring [of f ... (off +bits_per_param)] .reverse 
param. size .times do |j| 

sum += ((param [j] .chr=='l') ? 1.0 : 0.0) * (2.0 ** j .to_f ) 
end 

min, max = bounds 

vector << min + ( (max-min) /( (2 . 0**bits_per_param. to_f ) -1 . 0) ) * sum 
end 

return vector 
end 

def f itness (candidate , search_space , param_bits) 

candidate [: vector] =decode(candidate [ :bitstring] , search_space , 
param_bits) 

candidate [: fitness] = objective_f unction(candidate [: vector] ) 
end 

def binary_tournament (pop) 

i, j = rand(pop . size) , rand(pop . size) 
j = rand (pop . size) while j==i 

return (pop [i] [: fitness] < pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def point_mutation(bitstring, rate=l . 0/bitstring. size) 
child = "" 

bitstring. size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ( (bit== 1 1 1 ) ? "0" : "1") : bit) 
end 
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return child 
end 

def crossover (parentl , parent2, rate) 
return ""+parentl if rand()>=rate 
child = "" 

parentl . size . times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 

return child 
end 

def reproduce (selected, pop_size, p_cross, p_mut) 
children = [] 

selected. each_with_index do I pi, i| 
p2 = (i.modulo(2)==0) ? selected [i+1] : selected[i-l] 
p2 = selected[0] if i == selected. size-1 
child = {} 

child [:bitstring] = crossover (pi [:bitstring] , p2 [:bitstring] , 
p_cross) 

child [:bitstring] = point_mutation(child[:bitstring] , p_mut) 
children << child 

break if children. size >= pop_size 
end 

return children 
end 

def bitclimber (child, search_space , p_mut, max_local_gens , 
bits_per_param) 
current = child 
max_local_gens .times do 
candidate = {} 

candidate [ :bitstring] = point_mutation(current [:bitstring] , p_mut) 
f itness (candidate , search_space , bits_per_param) 
current = candidate if candidate [: fitness] <= current [: fitness] 
end 

return current 
end 

def search (max_gens , search_space , pop_size, p_cross, p_mut, 
max_local_gens , 
p_local, bits_per_param=16) 
pop = Array .new(pop_size) do |i| 

{ : bitstring=>random_bitstring (search_space . size*bits_per_param) } 
end 

pop. each{ I candidate I fitness (candidate , search_space , bits_per_param) 
} 

gen, best = 0, pop . sort{ I x,y I x[:fitness] <=> y [: fitness] }. first 

max_gens .times do I gen I 

selected = Array .new(pop_size){ I i I binary_tournament (pop) } 
children = reproduce (selected, pop_size, p_cross, p_mut) 
children. each{ I cand I f itness(cand, search_space , bits_per_param)} 
pop = [] 

children. each do I child I 
if randQ < p_local 

child = bitclimber (child, search_space , p_mut, max_local_gens , 
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bits_per_param) 

end 

pop << child 
end 

pop . sort ! { I x, y I x [: fitness] <=> y [: fitness] } 
best = pop. first if pop . first [: fitness] <= best [: fitness] 
puts ">gen=#{gen} , f =#{best [: fitness] } , b=#{best [ :bitstring] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

search_space = Array. new(problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_gens = 100 
pop_size = 100 

p_cross = 0.98 

p_mut = 1 . 0/ (problem_size*16) . to_f 
max_local_gens = 20 
p_local = 0.5 

# execute the algorithm 

best = search (max_gens , search_space , pop_size, p_cross, p_mut, 

max_local_gens , p_local) 
puts "done! Solution: f =#{best [: fitness] } , b=#{best [ :bitstring] } , 

v=#{best [: vector] .inspect}" 

end 

Listing 4.5: Memetic Algorithm in Ruby 



4.6.8 References 
Primary Sources 

The concept of a Memetic Algorithm is credited to Moscato [5], who 
was inspired by the description of meme's in Dawkins' "The Selfish 
Gene" [1]. Moscato proposed Memetic Algorithms as the marriage 
between population based global search and heuristic local search made 
by each individual without the constraints of a genetic representation 
and investigated variations on the Traveling Salesman Problem. 

Learn More 

Moscato and Cotta provide a gentle introduction to the field of Memetic 
Algorithms as a book chapter that covers formal descriptions of the 
approach, a summary of the fields of application, and the state of 
the art [6]. An overview and classification of the types of Memetic 
Algorithms is presented by Ong et al. who describe a class of adaptive 
Memetic Algorithms [7] . Krasnogor and Smith also provide a taxonomy 
of Memetic Algorithms, focusing on the properties needed to design 
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'competent' implementations of the approach with examples on a number 
of combinatorial optimization problems [4]. Work by Krasnogor and 
Gustafson investigate what they refer to as 'self-generating' Memetic 
Algorithms that use the memetic principle to co-evolve the local search 
applied by individual solutions [3]. For a broader overview of the held, 
see the 2005 book "Recent Advances in Memetic Algorithms" that 
provides an overview and a number of studies [2]. 
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Chapter 5 

Probabilistic Algorithms 



5.1 Overview 

This chapter describes Probabilistic Algorithms 

5.1.1 Probabilistic Models 

Probabilistic Algorithms are those algorithms that model a problem 
or search a problem space using an probabilistic model of candidate 
solutions. Many Metaheuristics and Computational Intelligence algo- 
rithms may be considered probabilistic, although the difference with 
algorithms is the explicit (rather than implicit) use of the tools of prob- 
ability in problem solving. The majority of the algorithms described in 
this Chapter are referred to as Estimation of Distribution Algorithms. 

5.1.2 Estimation of Distribution Algorithms 

Estimation of Distribution Algorithms (EDA) also called Probabilistic 
Model-Building Genetic Algorithms (PMBGA) are an extension of the 
field of Evolutionary Computation that model a population of candidate 
solutions as a probabilistic model. They generally involve iterations that 
alternate between creating candidate solutions in the problem space from 
a probabilistic model, and reducing a collection of generated candidate 
solutions into a probabilistic model. 

The model at the heart of an EDA typically provides the probabilistic 
expectation of a component or component configuration comprising part 
of an optimal solution. This estimation is typically based on the observed 
frequency of use of the component in better than average candidate 
solutions. The probabilistic model is used to generate candidate solutions 
in the problem space, typically in a component- wise or step- wise manner 
using a domain specific construction method to ensure validity. 
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Pelikan et al. provide a comprehensive summary of the field of 
probabilistic optimization algorithms, summarizing the core approaches 
and their differences [10]. The edited volume by Pelikan, Sastry, and 
Cantu-Paz provides a collection of studies on the popular Estimation of 
Distribution algorithms as well as methodology for designing algorithms 
and application demonstration studies [13] . An edited volume on studies 
of ED As by Larranaga and Lozano [4] and the follow-up volume by 
Lozano et al. [5] provide an applied foundation for the field. 

5.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Estimation of Distribution Algorithm, not 
limited to: 

• Extensions to UMDA: Extensions to the Univariate Marginal 
Distribution Algorithm such as the Bivariate Marginal Distribu- 
tion Algorithm (BMDA) [11, 12] and the Factorized Distribution 
Algorithm (FDA) [7]. 

• Extensions to cGA: Extensions to the Compact Genetic Algo- 
rithm such as the Extended Compact Genetic Algorithm (ECGA) 
[2, 3]. 

• Extensions to BOA: Extensions to the Bayesian Optimization 
Algorithm such as the Hierarchal Bayesian Optimization Algo- 
rithm (hBOA) [8, 9] and the Incremental Bayesian Optimization 
Algorithm (iBOA) [14]. 

• Bayesian Network Algorithms: Other Bayesian network algo- 
rithms such as The Estimation of Bayesian Network Algorithm [1] , 
and the Learning Factorized Distribution Algorithm (LFDA) [6]. 

• PIPE: The Probabilistic Incremental Program Evolution that 
uses EDA methods for constructing programs [16]. 

• SHCLVND: The Stochastic Hill-Climbing with Learning by Vec- 
tors of Normal Distributions algorithm [15]. 
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5.2 Population-Based Incremental Learning 

Population-Based Incremental Learning, PBIL. 

5.2.1 Taxonomy 

Population-Based Incremental Learning is an Estimation of Distribu- 
tion Algorithm (EDA), also referred to as Population Model-Building 
Genetic Algorithms (PMBGA) an extension to the field of Evolutionary 
Computation. PBIL is related to other EDAs such as the Compact 
Genetic Algorithm (Section 5.4), the Probabilistic Incremental Program- 
ing Evolution Algorithm, and the Bayesian Optimization Algorithm 
(Section 5.5). The fact the the algorithm maintains a single prototype 
vector that is updated competitively shows some relationship to the 
Learning Vector Quantization algorithm (Section 8.5). 

5.2.2 Inspiration 

Population-Based Incremental Learning is a population-based technique 
without an inspiration. It is related to the Genetic Algorithm and other 
Evolutionary Algorithms that are inspired by the biological theory of 
evolution by means of natural selection. 

5.2.3 Strategy 

The information processing objective of the PBIL algorithm is to reduce 
the memory required by the genetic algorithm. This is done by reducing 
the population of a candidate solutions to a single prototype vector of 
attributes from which candidate solutions can be generated and assessed. 
Updates and mutation operators are also performed to the prototype 
vector, rather than the generated candidate solutions. 

5.2.4 Procedure 

The Population-Based Incremental Learning algorithm maintains a 
real-valued prototype vector that represents the probability of each 
component being expressed in a candidate solution. Algorithm 5.2.1 
provides a pseudocode listing of the Population-Based Incremental 
Learning algorithm for maximizing a cost function. 

5.2.5 Heuristics 

• PBIL was designed to optimize the probability of components 
from low cardinality sets, such as bit's in a binary string. 
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Algorithm 5.2.1: Pseudocode for PBIL. 



Input: Bits num , Samples numi Learn rate , P, 

Mutation f ac t or 

Output: S best 

1 V <— InitializeVector (Bits num ) ; 

2 S best <- 0; 

3 while ^StopConditionO do 

S 'current ^ 0; 

for i to SampleSnum do 

Si <s— GenerateSamples (V) ; 
if Cost (S^) < Cost (ScurrenO then 

S current ^ S{ , 

if Cost (Si) < Cost (S hest ) then 

; Sbest 4~ S{', 

end 
end 
end 



mutation > 



foreach S l bit e S. 



do 



V bit ^ V bit x (1-0 - Lear n r a te) + S l bit x Learn rate ; 



if RandO < P, 



mutation 



then 



end 



V bit ^~ V b\t x (1-0 - Mutation fa ctor) + RandO x 

Mutation factor] 



end 



18 
19 

20 end 

21 return S bes u 



• The algorithm has a very small memory footprint (compared to 
some population-based evolutionary algorithms) given the com- 
pression of information into a single prototype vector. 



• Extensions to PBIL have been proposed that extend the represen- 
tation beyond sets to real-valued vectors. 



• Variants of PBIL that were proposed in the original paper include 
updating the prototype vector with more than one competitive 
candidate solution (such as an average of top candidate solutions) , 
and moving the prototype vector away from the least competitive 
candidate solution each iteration. 



Low learning rates are preferred, such as 0.1. 
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5.2.6 Code Listing 

Listing 5.1 provides an example of the Population-Based Incremental 
Learning algorithm implemented in the Ruby Programming Language. 
The demonstration problem is a maximizing binary optimization problem 
called OneMax that seeks a binary string of unity (all '1' bits). The 
objective function only provides an indication of the number of correct 
bits in a candidate string, not the positions of the correct bits. The 
algorithm is an implementation of the simple PBIL algorithm that 
updates the prototype vector based on the best candidate solution 
generated each iteration. 



def onemax (vector) 

return vector . inject (0) { I sum, valuel sum + value} 
end 

def generate_candidate (vector) 
candidate = {} 

candidate [ :bitstring] = Array .new(vector . size) 
vector . each_with_index do |p, i| 

candidate [ :bitstring] [i] = (rand()<p) ? 1 : 0 
end 

return candidate 
end 

def update_vector (vector , current, Irate) 

vector . each_with_index do |p, i| 

vector[i] = p* (1 . 0-lrate) + current [:bitstring] [i] *lrate 

end 
end 

def mutate_vector (vector , current, coefficient, rate) 
vector . each_with_index do |p, i| 
if randO < rate 

vector[i] = p* (1 . 0-coeff icient) + rand()*coef f icient 
end 
end 
end 

def search (num_bit s , max_iter, num_samples, p_mutate, mut_factor, 
l_rate) 

vector = Array. new (num_bits) {0.5} 
best = nil 

max_iter .times do literl 
current = nil 
num_samples . times do 

candidate = generate_candidate (vector) 
candidate [: cost] = onemax (candidate [ :bitstring] ) 
current = candidate if current. nil? or 

candidate [ : cost] >current [ : cost] 
best = candidate if best. nil? or candidate [: cost] >best [: cost] 
end 

update_vector (vector , current, l_rate) 
mutate_vector (vector , current, mut_f actor, p_mutate) 
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puts " >iteration=#{iter} , f =#{best [ : cost] } , s=#{best [ :bitstring] }" 
break if best [: cost] == num_bits 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 64 

# algorithm configuration 
max_iter = 100 
num_samples = 100 
p_mutate = 1 . 0/num_bits 
mut_f actor = 0.05 
l_rate =0.1 

# execute the algorithm 

best=search(num_bits , max_iter, num_samples , p_mutate, mut_factor, 
l_rate) 

puts "done! Solution: f =#{best [ : cost] }/#{num_bits} , 
s=#{best [ : bitstring] } " 

end 

Listing 5.1: Population-Based Incremental Learning in Ruby 



5.2.7 References 
Primary Sources 

The Population-Based Incremental Learning algorithm was proposed by 
Baluja in a technical report that proposed the base algorithm as well 
as a number of variants inspired by the Learning Vector Quantization 
algorithm [1]. 

Learn More 

Baluja and Caruana provide an excellent overview of PBIL and compare 
it to the standard Genetic Algorithm, released as a technical report [3] 
and later published [4]. Baluja provides a detailed comparison between 
the Genetic algorithm and PBIL on a range of problems and scales in 
another technical report [2] . Greene provided an excellent account on the 
applicability of PBIL as a practical optimization algorithm [5] . Hohfeld 
and Rudolph provide the first theoretical analysis of the technique and 
provide a convergence proof [6]. 
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5.3 Univariate Marginal Distribution Algo- 
rithm 

Univariate Marginal Distribution Algorithm, UMDA, Univariate Marginal 
Distribution, UMD. 

5.3.1 Taxonomy 

The Univariate Marginal Distribution Algorithm belongs to the held 
of Estimation of Distribution Algorithms (EDA), also referred to as 
Population Model-Building Genetic Algorithms (PMBGA), an extension 
to the held of Evolutionary Computation. UMDA is closely related to the 
Factorizcd Distribution Algorithm (FDA) and an extension called the 
Bivariate Marginal Distribution Algorithm (BMDA). UMDA is related 
to other ED As such as the Compact Genetic Algorithm (Section 5.4), 
the Population-Based Incremental Learning algorithm (Section 5.2), and 
the Bayesian Optimization Algorithm (Section 5.5). 

5.3.2 Inspiration 

Univariate Marginal Distribution Algorithm is a population technique- 
based without an inspiration. It is related to the Genetic Algorithm 
and other Evolutionary Algorithms that are inspired by the biological 
theory of evolution by means of natural selection. 

5.3.3 Strategy 

The information processing strategy of the algorithm is to use the 
frequency of the components in a population of candidate solutions 
in the construction of new candidate solutions. This is achieved by 
hrst measuring the frequency of each component in the population 
(the univariate marginal probability) and using the probabilities to 
influence the probabilistic selection of components in the component- 
wise construction of new candidate solutions. 

5.3.4 Procedure 

Algorithm 5.3.1 provides a pseudocode listing of the Univariate Marginal 
Distribution Algorithm for minimizing a cost function. 

5.3.5 Heuristics 

• UMDA was designed for problems where the components of a 
solution are independent (linearly separable). 
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Algorithm 5.3.1: Pseudocode for the UMDA. 





Input: Bits num , Population s i ze , Selection s i ze 




Output: S best 


1 


Population <— InitializePopulation(_Bzts„ um , 




Population s i ze ) ; 


2 


EvaluatePopulation(Population) ; 


3 


.5 


best <— GetBestSolution(Population) ; 


4 


while ^StopConditionO do 


5 




Selected <— SelectFitSolutions (Population, S 'election S i ze ); 


6 




V <— CalculateFrequencyOf Components (Selected) ; 


7 




Offspring <- 0; 


8 




for i to P ovulation s i ze do 


9 




| Offspring <— ProbabilisticallyConstructSolution(F) ; 


10 




end 


11 




EvaluatePopulation(Offspring) ; 


12 




Sbest <— GetBestSolution(Offspring) ; 


13 




Population Offspring; 


14 


end 


15 


return S best ; 




• 


A selection method is needed to identify the subset of good solu- 



tions from which to calculate the univariate marginal probabilities. 
Many selection methods from the field of Evolutionary Computa- 
tion may be used. 

5.3.6 Code Listing 

Listing 5.2 provides an example of the Univariate Marginal Distribution 
Algorithm implemented in the Ruby Programming Language. The 
demonstration problem is a maximizing binary optimization problem 
called OneMax that seeks a binary string of unity (all '1' bits). The 
objective function provides only an indication of the number of correct 
bits in a candidate string, not the positions of the correct bits. 

The algorithm is an implementation of UMDA that uses the integers 
1 and 0 to represent bits in a binary string representation. A binary tour- 
nament selection strategy is used and the whole population is replaced 
each iteration. The mechanisms from Evolutionary Computation such 
as elitism and more elaborate selection methods may be implemented 
as an extension. 

1 def onemax (vector) 

2 return vector . inject (0) { I sum, valuel sum + value} 

3 end 
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def random_bitstring(size) 

return Array .new(size){ ( (randQ <0 . 5) ? 1 : 0) } 
end 

def binary_tournament (pop) 

i, j = rand(pop.size) , rand (pop . size) 
j = rand(pop.size) while j==i 

return (pop [i] [: fitness] > pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def calculate_bit_probabilities (pop) 

vector = Array .new(pop . first [ :bitstring] . length, 0.0) 
pop. each do I member I 

member [ :bitstring] . each_with_index {|v, i| vector[i] += v} 
end 

vector . each_with_index {If, i| vector[i] = (f .to_f /pop . size .to_f)} 
return vector 
end 

def generate_candidate (vector) 
candidate = {} 

candidate [ :bitstring] = Array .new(vector . size) 
vector . each_with_index do |p, i| 

candidate [ :bitstring] [i] = (rand()<p) ? 1 : 0 
end 

return candidate 
end 

def search (num_bits, max_iter, pop_size, select_size) 
pop = Array .new(pop_size) do 

{ : bitstring=>random_bitstring(num_bits) } 
end 

pop.each{|c| c[:fitness] = onemax(c [:bitstring] )} 

best = pop . sort{ I x,y I y[:fitness] <=> x [: fitness] }. first 

max_iter .times do I iter I 

selected = Array .new(select_size) { binary .tournament (pop) } 

vector = calculate_bit_probabilities (selected) 

samples = Array .new(pop_size) { generate_candidate (vector) } 

samples . each{ I c I c[:fitness] = onemax(c [ :bitstring] ) } 

samples . sort !{ I x,y I y[:fitness] <=> x[:fitness]} 

best = samples .first if samples . first [: fitness] > best [: fitness] 

pop = samples 

puts " >iteration=#{iter} , f =#{best [: fitness] } , 
s=#{best [:bitstring]}" 

end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 64 

# algorithm configuration 
max_iter = 100 
pop_size = 50 
select_size = 30 

# execute the algorithm 
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best = search (num_bits , max_iter, pop_size, select_size) 
puts "done! Solution: f =#{best [: fitness] } , s=#{best [ :bitstring] }" 
end 

Listing 5.2: Univariate Marginal Distribution Algorithm in Ruby 

5.3.7 References 
Primary Sources 

The Univariate Marginal Distribution Algorithm was described by 
Muhlenbein in 1997 in which a theoretical foundation is provided (for 
the field of investigation in general and the algorithm specifically) [2] . 
Muhlenbein also describes an incremental version of UMDA (IUMDA) 
that is described as being equivalent to Baluja's Population-Based In- 
cremental Learning (PBIL) algorithm [1]. 

Learn More 

Pelikan and Muhlenbein extended the approach to cover problems 
that have dependencies between the components (specifically pair- 
dependencies), referring to the technique as the Bivariate Marginal 
Distribution Algorithm (BMDA) [3, 4]. 

5.3.8 Bibliography 

[1] S. Baluja. Population-based incremental learning: A method for inte- 
grating genetic search based function optimization and competitive 
learning. Technical Report CMU-CS-94-163, School of Computer 
Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, 
June 1994. 

[2] H. Muhlenbein. The equation for response to selection and its use 
for prediction. Evolutionary Computation, 5(3):303-346, 1997. 

[3] M. Pelikan and H. Muhlenbein. Marginal distributions in evolution- 
ary algorithms. In Proceedings of the International Conference on 
Genetic Algorithms Mendel, 1998. 

[4] M. Pelikan and H. Muhlenbein. Advances in Soft Computing: Engi- 
neering Design and Manufacturing, chapter The Bivariate Marginal 
Distribution Algorithm, pages 521-535. Springer, 1999. 



220 



Chapter 5. Probabilistic Algorithms 



5.4 Compact Genetic Algorithm 

Compact Genetic Algorithm, CCA, cGA. 

5.4.1 Taxonomy 

The Compact Genetic Algorithm is an Estimation of Distribution Al- 
gorithm (EDA), also referred to as Population Model-Building Genetic 
Algorithms (PMBGA) , an extension to the field of Evolutionary Com- 
putation. The Compact Genetic Algorithm is the basis for extensions 
such as the Extended Compact Genetic Algorithm (ECGA). It is related 
to other EDAs such as the Univariate Marginal Probability Algorithm 
(Section 5.3), the Population-Based Incremental Learning algorithm 
(Section 5.2), and the Bayesian Optimization Algorithm (Section 5.5). 

5.4.2 Inspiration 

The Compact Genetic Algorithm is a probabilistic technique without an 
inspiration. It is related to the Genetic Algorithm and other Evolutionary 
Algorithms that are inspired by the biological theory of evolution by 
means of natural selection. 

5.4.3 Strategy 

The information processing objective of the algorithm is to simulate the 
behavior of a Genetic Algorithm with a much smaller memory footprint 
(without requiring a population to be maintained). This is achieved 
by maintaining a vector that specifies the probability of including each 
component in a solution in new candidate solutions. Candidate solutions 
are probabilistically generated from the vector and the components in 
the better solution are used to make small changes to the probabilities 
in the vector. 

5.4.4 Procedure 

The Compact Genetic Algorithm maintains a real-valued prototype 
vector that represents the probability of each component being expressed 
in a candidate solution. Algorithm 5.4.1 provides a pseudocode listing 
of the Compact Genetic Algorithm for maximizing a cost function. The 
parameter n indicates the amount to update probabilities for conflicting 
bits in each algorithm iteration. 

5.4.5 Heuristics 

• The vector update parameter (n) influences the amount that the 
probabilities are updated each algorithm iteration. 
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Algorithm 5.4.1: Pseudocode for the cGA. 



Input: Bits num , n 
Output: S best 

1 V InitializeVector (Bits nurn 

2 S best <- 0; 

3 while ^StopConditionO do 



, 0.5): 



51 4— GenerateSamples(T^) ; 

52 -s— GenerateSamples(y) ; 

Swinner, Si oser <— SelectWinner AndLoser (Si , $2): 

if Cost ( S W i nner ) < Cost (Sf, est ) then 
I Sbest ^ S 'winner: 

end 

for i to BitSnum do 

i= S loser then 



if <?* 

'-'winner 



if SI 



V/ 



else 



V} <r- V? 



end 
end 
end 

19 end 

20 return S bes u 



1 then 

+ b 
_ 1. 

n ' 



• The vector update parameter (n) may be considered to be compa- 
rable to the population size parameter in the Genetic Algorithm. 

• Early results demonstrate that the cGA may be comparable to a 
standard Genetic Algorithm on classical binary string optimization 
problems (such as OneMax). 

• The algorithm may be considered to have converged if the vector 
probabilities are all either 0 or 1. 

5.4.6 Code Listing 

Listing 5.3 provides an example of the Compact Genetic Algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is a maximizing binary optimization problem called OneMax 
that seeks a binary string of unity (all '1' bits). The objective function 
only provides an indication of the number of correct bits in a candidate 
string, not the positions of the correct bits. The algorithm is an imple- 
mentation of Compact Genetic Algorithm that uses integer values to 
represent 1 and 0 bits in a binary string representation. 
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def onemax (vector) 

return vector . inject (0) { I sum, value I sum + value} 
end 

def generate_candidate (vector) 
candidate = {} 

candidate [ :bitstring] = Array .new (vector . size) 
vector . each_with_index do |p, i| 

candidate [ :bitstring] [i] = (rand()<p) ? 1 : 0 
end 

candidate [: cost] = onemax (candidate [ :bitstring] ) 
return candidate 
end 

def update_vector(vector , winner, loser, pop_size) 
vector. size. times do |i| 

if winner [:bitstring] [i] != loser [: bitstring] [i] 
if winner [: bitstring] [i] == 1 

vector [i] += 1 . 0/pop_size . to_f 
else 

vector [i] -= 1 . 0/pop_size . to_f 
end 
end 
end 
end 

def search (num_bits , max_iterations , pop_size) 
vector = Array .new(num_bits) {0 . 5} 
best = nil 

max_iterations .times do I iter I 
cl = generate_candidate (vector) 
c2 = generate_candidate (vector) 

winner, loser = (cl[:cost] > c2[:cost] ? [cl,c2] : [c2,cl]) 
best = winner if best. nil? or winner [: cost] >best [: cost] 
update_vector(vector , winner, loser, pop_size) 

puts " >iteration=#{iter}, f =#{best [ : cost] } , s=#{best [ :bitstring] }" 
break if best [: cost] == num_bits 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 32 

# algorithm configuration 
max_iterations = 200 
pop_size = 20 

# execute the algorithm 

best = search (num_bits , max_iterations , pop_size) 
puts "done! Solution: f =#{best [ : cost] }/#{num_bits} , 
s=#{best [ : bitstring] }" 

end 
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5.4.7 References 
Primary Sources 

The Compact Genetic Algorithm was proposed by Harik, Lobo, and 
Goldberg in 1999 [3], based on a random walk model previously in- 
troduced by Harik et al. [2]. In the introductory paper, the cGA is 
demonstrated to be comparable to the Genetic Algorithm on standard 
binary string optimization problems. 

Learn More 

Harik et al. extended the Compact Genetic Algorithm (called the Ex- 
tended Compact Genetic Algorithm) to generate populations of candi- 
date solutions and perform selection (much like the Univariate Marginal 
Probabilist Algorithm), although it used Marginal Product Models [1, 4]. 
Sastry and Goldberg performed further analysis into the Extended Com- 
pact Genetic Algorithm applying the method to a complex optimization 
problem [5]. 

5.4.8 Bibliography 

[1] G. R. Harik. Linkage learning via probabilistic modeling in the 
extended compact genetic algorithm (ECGA). Technical Report 
99010, Illinois Genetic Algorithms Laboratory, Department of Gen- 
eral Engineering, University of Illinois, 1999. 

[2] G. R. Harik, E. Cantu-Paz, D. E. Goldberg, and B. L. Miller. The 
gambler's ruin problem, genetic algorithms, and the sizing of popu- 
lations. In IEEE International Conference on Evolutionary Compu- 
tation, pages 7-12, 1997. 

[3] G. R. Harik, F. G. Lobo, and D. E. Goldberg. The compact ge- 
netic algorithm. IEEE Transactions on Evolutionary Computation, 
3(4):287-297, 1999. 

[4] G. R. Harik, F. G. Lobo, and K. Sastry. Scalable Optimization via 
Probabilistic Modeling, chapter Linkage Learning via Probabilistic 
Modeling in the Extended Compact Genetic Algorithm (ECGA), 
pages 39-61. Springer, 2006. 

[5] K. Sastry and D. E. Goldberg. On extended compact genetic algo- 
rithm. In Late Breaking Paper in Genetic and Evolutionary Compu- 
tation Conference, pages 352-359, 2000. 
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5.5 Bayesian Optimization Algorithm 

Bayesian Optimization Algorithm, BOA. 

5.5.1 Taxonomy 

The Bayesian Optimization Algorithm belongs to the field of Estima- 
tion of Distribution Algorithms, also referred to as Population Model- 
Building Genetic Algorithms (PMBGA) an extension to the field of 
Evolutionary Computation. More broadly, BOA belongs to the field of 
Computational Intelligence. The Bayesian Optimization Algorithm is 
related to other Estimation of Distribution Algorithms such as the Popu- 
lation Incremental Learning Algorithm (Section 5.2), and the Univariate 
Marginal Distribution Algorithm (Section 5.3). It is also the basis for 
extensions such as the Hierarchal Bayesian Optimization Algorithm 
(hBOA) and the Incremental Bayesian Optimization Algorithm (iBOA). 

5.5.2 Inspiration 

Bayesian Optimization Algorithm is a technique without an inspiration. 
It is related to the Genetic Algorithm and other Evolutionary Algorithms 
that are inspired by the biological theory of evolution by means of natural 
selection. 

5.5.3 Strategy 

The information processing objective of the technique is to construct a 
probabilistic model that describes the relationships between the compo- 
nents of fit solutions in the problem space. This is achieved by repeating 
the process of creating and sampling from a Bayesian network that 
contains the conditional dependancies, independencies, and conditional 
probabilities between the components of a solution. The network is 
constructed from the relative frequencies of the components within a 
population of high fitness candidate solutions. Once the network is 
constructed, the candidate solutions are discarded and a new population 
of candidate solutions are generated from the model. The process is 
repeated until the model converges on a fit prototype solution. 

5.5.4 Procedure 

Algorithm 5.5.1 provides a pseudocode listing of the Bayesian Optimiza- 
tion Algorithm for minimizing a cost function. The Bayesian network 
is constructed each iteration using a greedy algorithm. The network is 
assessed based on its fit of the information in the population of candidate 
solutions using either a Bayesian Dirichlet Metric (BD) [9], or a Bayesian 



5.5. Bayesian Optimization Algorithm 



225 



Information Criterion (BIC). Refer to Chapter 3 of Pelikan's book for a 
more detailed presentation of the pseudocode for BOA [5]. 

Algorithm 5.5.1: Pseudocode for BOA. 
Input: Bits numi Population S i ze , Selection S i ze 
Output: S best 

1 Population <— InitializePopulation(_Bzis raum , 
Population size ) ; 

2 EvaluatePopulation(Population) ; 

3 Sbest <— GetBestSolution(Population) ; 

4 while ^StopConditionO do 

5 Selected <— SelectFitSolutions (Population, S 'election S i ze )\ 

6 Model <— ConstructBayesianNetwork(Selected) ; 

7 Offspring <r- 0; 

8 for i to Population s i ze do 

9 Offspring 
ProbabilisticallyConstruct Solution (Model) ; 

10 end 

n EvaluatePopulation(Offspring) ; 

12 Sbest <— GetBestSolution(Offspring) ; 

13 Population Combine (Population, Offspring); 

14 end 

15 return S be su 



5.5.5 Heuristics 

• The Bayesian Optimization Algorithm was designed and investi- 
gated on binary string-base problems, most commonly representing 
binary function optimization problems. 

• Bayesian networks are typically constructed (grown) from scratch 
each iteration using an iterative process of adding, removing, and 
reversing links. Additionally, past networks may be used as the 
basis for the subsequent generation. 

• A greedy hill-climbing algorithm is used each algorithm iteration to 
optimize a Bayesian network to represent a population of candidate 
solutions. 

• The fitness of constructed Bayesian networks may be assessed using 
the Bayesian Dirichlet Metric (BD) or a Minimum Description 
length method called the Bayesian Information Criterion (BIC). 
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5.5.6 Code Listing 

Listing 5.4 provides an example of the Bayesian Optimization Algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is a maximizing binary optimization problem called OneMax 
that seeks a binary string of unity (all '1' bits). The objective function 
provides only an indication of the number of correct bits in a candidate 
string, not the positions of the correct bits. 

The Bayesian Optimization Algorithm can be tricky to implement 
given the use of of a Bayesian Network at the core of the technique. The 
implementation of BOA provided is based on the the C++ implementa- 
tion provided by Pelikan, version 1.0 [3]. Specifically, the implementation 
uses the K2 metric to construct a Bayesian network from a population 
of candidate solutions [1]. Essentially, this metric is a greedy algorithm 
that starts with an empty graph and adds the arc with the most gain 
each iteration until a maximum number of edges have been added or no 
further edges can be added. The result is a directed acyclic graph. The 
process that constructs the graph imposes limits, such as the maximum 
number of edges and the maximum number of in-bound connections per 
node. 

New solutions are sampled from the graph by first topologically 
ordering the graph (so that bits can be generated based on their depen- 
dencies), then probabilistically sampling the bits based on the conditional 
probabilities encoded in the graph. The algorithm used for sampling 
the conditional probabilities from the network is Probabilistic Logic 
Sampling [2]. The stopping condition is either the best solution for the 
problem is found or the system converges to a single bit pattern. 

Given that the implementation was written for clarity, it is slow 
to execute and provides an great opportunity for improvements and 
efficiencies. 

def onemax (vector) 

return vector . inject (0) { I sum, value I sum + value} 
end 

def random_bitstrlng(size) 

return Array .new(size){ ( (rand() <0 . 5) ? 1 : 0) } 
end 

def path_exists? (i , j, graph) 
visited, stack = [] , [i] 
while ! stack. empty? 

return true if stack. include? (j ) 

k = stack. shift 

next if visited. include? (k) 

visited << k 

graph [k] [: out] . each {|m| stack. unshif t (m) if ! visited. include? (m)} 
end 

return false 
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end 

def can_add_edge? (i , j, graph) 

return ! graph [i] [: out] . include? (j ) kk ! path_exists? (j , i, graph) 
end 

def get_viable_parents(node, graph) 
viable = [] 

graph. size. times do |i| 

if node ! =i and can_add_edge? (node , i , graph) 
viable << i 

end 
end 

return viable 
end 

def compute. count _f or_edges (pop , indexes) 
counts = Array .new(2** (indexes . size) ){0} 
pop. each do |p| 
index = 0 

indexes . reverse . each_with_index do |v,i| 

index += ( (p [ :bitstring] [v] == 1) ? 1 : 0) * (2**i) 
end 

counts [index] += 1 
end 

return counts 
end 

def fact(v) 

return v <= 1 ? 1 : v*fact(v-l) 
end 

def k2equat ion (node , candidates, pop) 

counts = compute. count _f or_edges (pop , [node] +candidates) 
total = nil 

(counts . size/2) . times do |i| 
al, a2 = counts [i*2] , counts [ (i*2) +1] 

rs = (1.0/fact((al+a2)+l) .to_f ) * f act (al) . to_f * f act(a2) .to_f 
total = (total. nil? ? rs : total*rs) 
end 

return total 
end 

def compute_gains (node , graph, pop, max=2) 

viable = get_viable_parents (node [ :num] , graph) 
gains = Array. new(graph. size) {-1.0} 
gains . each_index do |i| 

if graph [i] [: in] . size < max and viable . include?(i) 

gains[i] = k2equat ion (node [ :num] , node [ : in] + [i] , pop) 
end 
end 

return gains 
end 

def construct_network(pop, prob_size, max_edges=3*pop . size) 

graph = Array .new(prob_size) { I i I { : out=> [] , : in=> [] , :num=>i} } 
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gains = Array .new(prob_size) 
max_edges .times do 

max, from, to = -1, nil, nil 

graph. each_with_index do I node, i| 

gains [i] = compute_gains (node , graph, pop) 

gains [i] . each_with_index {|v,j| from, to, max = i,j,v if v>max} 
end 

break if max <= 0.0 
graph [from] [: out] « to 
graph [to] [: in] << from 
end 

return graph 
end 

def topological_ordering (graph) 

graph. each {|n| n[: count] = n[: in]. size} 
ordered, stack = [] , graph. select {|n| n[: count] ==0} 
while ordered. size < graph. size 
current = stack. shift 
current [: out] . each do I edge I 

node = graph. find {|n| n [ :num] ==edge} 
node [: count] -= 1 

stack << node if node [: count] <= 0 
end 

ordered << current 
end 

return ordered 
end 

def marginal_probability (i , pop) 

return pop . inject (0 . 0){ I s ,x I s + x [ :bitstring] [i] } / pop . size .to_f 
end 

def calculate_probability(node, bitstring, graph, pop) 

return marginal_probability (node [ :num] , pop) if node [: in] . empty? 
counts = compute_count_f or_edges(pop, [node [ : num] ] +node [ : in] ) 
index = 0 

node [: in] . reverse . each_with_index do |v,i| 

index += ( (bitstring [v] == 1) ? 1 : 0) * (2**i) 
end 

11 = index + (1*2** (node [: in] . size) ) 

12 = index + (0*2** (node [: in] . size) ) 

al, a2 = counts [il] .to_f , counts [i2] .to_f 
return al/(al+a2) 
end 

def probabilistic_logic_sample (graph, pop) 

bitstring = Array .new (graph. size) 

graph . each do I node I 

prob = calculate_probability(node, bitstring, graph, pop) 
bitstring [node [: num] ] = ((randQ < prob) ? 1 : 0) 

end 

return { : bitstring=>bitstring} 
end 

def sample_f rom_network(pop, graph, num_samples) 
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ordered = topological_ordering(graph) 
samples = Array .new(num_samples) do 

probabilistic_logic_sample (ordered, pop) 
end 

return samples 
end 

def search (num_bits, max_iter, pop_size, select_size, num_children) 

pop = Array .new (pop_size) { { :bitstring=>random_bitstring(num_bits)} } 
pop.each{|c| c[:cost] = onemax(c [ :bitstring] )} 
best = pop . sort ! { I x,y I y[:cost] <=> x [: cost] }. first 
max_iter .times do I it I 

selected = pop . first (select_size) 

network = construct_network(selected, num_bits) 

arcs = network. inject (0) { I s ,x I s+x [: out] . size} 

children = sample_f rom_network(selected, network, num_children) 
children. each{ I c I c[:cost] = onemax(c [ :bitstring] )} 

children. each { I c I puts " >>sample, f =#{c [ : cost] } #{c [ :bitstring] }"} 

pop = pop[0. . . (pop_size-select_size)] + children 

pop. sort! {|x,y| y[:cost] <=> x[:cost]} 

best = pop. first if pop . first [: cost] >= best[:cost] 

puts " >it=#{it}, arcs=#{arcs} , f =#{best [ : cost] } , 

[#{best [ : bitstr ing] }] " 
converged = pop. select {|x| 

x [: bit string] ! =pop .first [: bit string] } . empty? 
break if converged or best [: cost] ==num_bits 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
num_bits = 20 

# algorithm configuration 
max_iter = 100 
pop_size = 50 
select_size = 15 
num_children = 25 

# execute the algorithm 

best = search (num_bits , max_iter, pop_size, select_size, num_children) 
puts "done! Solution: f =#{best [ : cost] }/#{num_bits} , 
s=#{best [: bit string] }" 

end 



Listing 5.4: Bayesian Optimization Algorithm in Ruby 



5.5.7 References 
Primary Sources 

The Bayesian Optimization Algorithm was proposed by Pelikan, Gold- 
berg, and Cantu-Paz in the technical report [8], that was later published 
[10]. The technique was proposed as an extension to the state of Es- 
timation of Distribution algorithms (such as the Univariate Marginal 
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Distribution Algorithm and the Bivariate Marginal Distribution Algo- 
rithm) that used a Bayesian Network to model the relationships and 
conditional probabilities for the components expressed in a population of 
fit candidate solutions. Pelikan, Goldberg, and Cantu-Paz also described 
the approach applied to deceptive binary optimization problems (trap 
functions) in a paper that was published before the seminal journal 
article [9]. 

Learn More 

Pelikan and Goldberg described an extension to the approach called 
the Hierarchical Bayesian Optimization Algorithm (hBOA) [6, 7]. The 
differences in the hBOA algorithm are that it replaces the decision 
tables (used to store the probabilities) with decision graphs and used a 
niching method called Restricted Tournament Replacement to maintain 
diversity in the selected set of candidate solutions used to construct the 
network models. Pelikan's work on BOA culminated in his PhD thesis 
that provides a detailed treatment of the approach, its configuration and 
application [4]. Pelikan, Sastry, and Goldberg proposed the Incremental 
Bayesian Optimization Algorithm (iBOA) extension of the approach that 
removes the population and adds incremental updates to the Bayesian 
network [11]. 

Pelikan published a book that focused on the technique, walking 
through the development of probabilistic algorithms inspired by evo- 
lutionary computation, a detailed look at the Bayesian Optimization 
Algorithm (Chapter 3), the hierarchic extension to Hierarchical Bayesian 
Optimization Algorithm and demonstration studies of the approach on 
test problems [5]. 
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5.6 Cross-Entropy Method 

Cross-Entropy Method, Cross Entropy Method, CEM. 

5.6.1 Taxonomy 

The Cross-Entropy Method is a probabilistic optimization belonging to 
the field of Stochastic Optimization. It is similar to other Stochastic 
Optimization and algorithms such as Simulated Annealing (Section 4.2), 
and to Estimation of Distribution Algorithms such as the Probabilistic 
Incremental Learning Algorithm (Section 5.2). 

5.6.2 Inspiration 

The Cross-Entropy Method does not have an inspiration. It was de- 
veloped as an efficient estimation technique for rare-event probabilities 
in discrete event simulation systems and was adapted for use in opti- 
mization. The name of the technique comes from the Kullback-Leibler 
cross-entropy method for measuring the amount of information (bits) 
needed to identify an event from a set of probabilities. 

5.6.3 Strategy 

The information processing strategy of the algorithm is to sample the 
problem space and approximate the distribution of good solutions. This 
is achieved by assuming a distribution of the problem space (such 
as Gaussian), sampling the problem domain by generating candidate 
solutions using the distribution, and updating the distribution based on 
the better candidate solutions discovered. Samples are constructed step- 
wise (one component at a time) based on the summarized distribution 
of good solutions. As the algorithm progresses, the distribution becomes 
more refined until it focuses on the area or scope of optimal solutions in 
the domain. 

5.6.4 Procedure 

Algorithm 5.6.1 provides a pseudocode listing of the Cross- Entropy 
Method algorithm for minimizing a cost function. 

5.6.5 Heuristics 

• The Cross-Entropy Method was adapted for combinatorial op- 
timization problems, although has been applied to continuous 
function optimization as well as noisy simulation problems. 
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Algorithm 5.6.1: Pseudocode for the Cross-Entropy Method. 
Input: Problem S i ze , Samples num , UpdateSamples num , 

Learn rate , Variance min 
Output: S best 

1 Means 4— InitializeMeans () ; 

2 Variances^— InitializeVariances () ; 

3 S best <- 0; 

4 while Max (Variances) < V ariance m i n do 

5 Samples <— 0; 

6 for i = 0 to Samples num do 

7 | Samples GenerateSample (Means, Variances); 

8 end 

9 EvaluateSamples (Samples) ; 

10 SortSamplesByQuality (Samples) ; 

n if Cost (Samples®) < Cost(Sbest) then 

12 | S bes t *- Sampleso; 

13 end 

14 Samples se i ec t e d SelectBestSamples (Samples, 

UpdateS amples num ) ; 
for i = 0 to Problem S ize do 

Meansi 4— Meansi + Learn ra te x Mean (Samples se iecte 

0; 

T^arianceSi ^— VarianceSi + Learn rat e x 
Variance (Samples selected, 0; 
end 

19 end 

20 return 5 6esf ; 



15 
16 



17 



• A alpha (a) parameter or learning rate € [0.1] is typically set high, 
such as 0.7. 

• A smoothing function can be used to further control the updates 
the summaries of the distribution(s) of samples from the problem 
space. For example, in continuous function optimization a (3 
parameter may replace a for updating the standard deviation, 
calculated at time t as j3 t — (3 — /3 x (1 — j) q , where f3 is initially 
set high G [0.8,0.99] and q is a small integer £ [5, 10]. 



5.6.6 Code Listing 

Listing 5.5 provides an example of the Cross-Entropy Method algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is an instance of a continuous function optimization problem 
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that seeks min /(a;) where / = X^i 2 -?' — 5.0 < Xi < 5.0 and n = 3. 
The optimal solution for this basin function is (vq, . . . , w„_i) = 0.0. 

The algorithm was implemented based on a description of the Cross- 
Entropy Method algorithm for continuous function optimization by 
Rubinstein and Krocse in Chapter 5 and Appendix A of their book on 
the method [5] . The algorithm maintains means and standard deviations 
of the distribution of samples for convenience. The means and standard 
deviations are initialized based on random positions in the problem space 
and the bounds of the whole problem space respectively. A smoothing 
parameter is not used on the standard deviations. 

def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_variable (minmax) 
min, max = minmax 

return min + ((max - min) * randO) 
end 

def random_gaussian(mean=0 . 0 , stdev=1.0) 
ul = u2 = w = 0 
begin 

ul = 2 * randO - 1 

u2 = 2 * randO - 1 

w = ul * ul + u2 * u2 
end while w >= 1 

w = Math. sqrt( (-2.0 * Math. log (w) ) / w) 
return mean + (u2 * w) * stdev 
end 

def generate_sample (search_space , means, stdevs) 

vector = Array .new(search_space . size) 

search_space . size .times do |i| 
vector [i] = random_gaussian (means [i] , stdevs [i] ) 
vector [i] = search_space [i] [0] if vector [i] < search_space [i] [0] 
vector [i] = search_space [i] [1] if vector [i] > search_space [i] [1] 

end 

return { : vector=>vector} 
end 

def mean_attr (samples, i) 

sum = samples . inject (0 . 0) do Is, sample I 

s + sample [: vector] [i] 
end 

return (sum / samples . size . to_f) 
end 

def stdev_attr(samples, mean, i) 

sum = samples . inject (0 . 0) do Is, sample I 

s + (sample [: vector] [i] - mean)**2.0 
end 

return Math . sqrt ( sum / samples . size .to_f) 
end 



5.6. Cross-Entropy Method 



235 



def update_distribution ! (samples , alpha, means, stdevs) 
means . size . times do |i| 

means [i] = alpha*means [i] + ( (1 . 0-alpha) *mean_attr (samples , i)) 
stdevs [i] = 

alpha*stdevs [i] +( (1 . 0-alpha) *stdev_attr (samples , means [i] , i) ) 

end 
end 

def search(bounds , max_iter, num_samples, num_update, learning_rate) 
means = Array .new(bounds . size) { I i I random_variable (bounds [i] ) } 
stdevs = Array . new (bounds . size) { I i I bounds [i] [1] -bounds [i] [0] } 
best = nil 

max_iter .times do I iter I 

samples = Array .new(num_samples) {generate_sample (bounds , means, 
stdevs) } 

samples. each { I samp I samp [: cost] = 

ob j ect i ve_f unct ion (samp [ : vector] ) } 
samples . sort ! { I x ,y I x [ : cost] <=>y [ : cost] } 

best = samples . first if best. nil? or samples . first [: cost] < 

best [ : cost] 
selected = samples . first (num_update) 

update_distribution! (selected, learning_rate , means, stdevs) 
puts " > iteration=#{iter} , f itness=#{best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

search_space = Array .new (problem_size) {|i| [-5, 5]} 

# algorithm configuration 
max_iter = 100 
num_samples = 50 
num_update = 5 

l_rate =0.7 

# execute the algorithm 

best = search(search_space, max_iter, num_samples, num_update, l_rate) 
puts "done! Solution: f =#{best [ : cost] } , s=#{best [: vector] . inspect}" 
end 

Listing 5.5: Cross-Entropy Method in Ruby 

5.6.7 References 
Primary Sources 

The Cross-Entropy method was proposed by Rubinstein in 1997 [2] 
for use in optimizing discrete event simulation systems. It was later 
generalized by Rubinstein and proposed as an optimization method for 
combinatorial function optimization in 1999 [3]. This work was further 
elaborated by Rubinstein providing a detailed treatment on the use of 
the Cross- Entropy method for combinatorial optimization [4] . 
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Learn More 

De Boer et al. provide a detailed presentation of Cross-Entropy method 
including its application in rare event simulation, its adaptation to 
combinatorial optimization, and example applications to the max-cut, 
traveling salesman problem, and a clustering numeric optimization 
example [1]. Rubinstein and Kroese provide a thorough presentation of 
the approach in their book, summarizing the relevant theory and the 
state of the art [5]. 
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Swarm Algorithms 



6.1 Overview 

This chapter describes Swarm Algorithms. 

6.1.1 Swarm Intelligence 

Swarm intelligence is the study of computational systems inspired by 
the 'collective intelligence'. Collective Intelligence emerges through the 
cooperation of large numbers of homogeneous agents in the environment. 
Examples include schools of fish, flocks of birds, and colonies of ants. 
Such intelligence is decentralized, self-organizing and distributed through 
out an environment. In nature such systems are commonly used to 
solve problems such as effective foraging for food, prey evading, or 
colony re-location. The information is typically stored throughout the 
participating homogeneous agents, or is stored or communicated in 
the environment itself such as through the use of pheromones in ants, 
dancing in bees, and proximity in fish and birds. 

The paradigm consists of two dominant sub-fields 1) Ant Colony 
Optimization that investigates probabilistic algorithms inspired by the 
stigmergy and foraging behavior of ants, and 2) Particle Swarm Op- 
timization that investigates probabilistic algorithms inspired by the 
flocking, schooling and herding. Like evolutionary computation, swarm 
intelligence 'algorithms' or 'strategies' are considered adaptive strategies 
and are typically applied to search and optimization domains. 

6.1.2 References 

Seminal books on the field of Swarm Intelligence include "Swarm Intel- 
ligence" by Kennedy, Eberhart and Shi [10], and "Swarm Intelligence: 
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From Natural to Artificial Systems" by Bonabeau, Dorigo, and Ther- 
aulaz [3] . Another excellent text book on the area is " Fundamentals of 
Computational Swarm Intelligence" by Engelbrecht [7]. The seminal 
book reference for the field of Ant Colony Optimization is " Ant Colony 
Optimization" by Dorigo and Stiitzle [6] . 

6.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Swarm Intelligence, not limited to: 

• Ant Algorithms: such as Max-Min Ant Systems [15] Rank- 
Based Ant Systems [4], Elitist Ant Systems [5], Hyper Cube 
Ant Colony Optimization [2] Approximate Nondeterministic Tree- 
Search (ANTS) [12] and Multiple Ant Colony System [8]. 

• Bee Algorithms: such as Bee System and Bee Colony Optimiza- 
tion [11], the Honey Bee Algorithm [16], and Artificial Bee Colony 
Optimization [1, 9]. 

• Other Social Insects: algorithms inspired by other social insects 
besides ants and bees, such as the Firey Algorithm [18] and the 
Wasp Swarm Algorithm [14]. 

• Extensions to Particle Swarm: such as Repulsive Particle 
Swarm Optimization [17]. 

• Bacteria Algorithms: such as the Bacteria Chemotaxis Algo- 
rithm [13]. 
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6.2 Particle Swarm Optimization 

Particle Swarm Optimization, PSO. 

6.2.1 Taxonomy 

Particle Swarm Optimization belongs to the field of Swarm Intelli- 
gence and Collective Intelligence and is a sub-field of Computational 
Intelligence. Particle Swarm Optimization is related to other Swarm 
Intelligence algorithms such as Ant Colony Optimization and it is a 
baseline algorithm for many variations, too numerous to list. 

6.2.2 Inspiration 

Particle Swarm Optimization is inspired by the social foraging behavior 
of some animals such as flocking behavior of birds and the schooling 
behavior of fish. 

6.2.3 Metaphor 

Particles in the swarm fly through an environment following the fitter 
members of the swarm and generally biasing their movement toward 
historically good areas of their environment. 

6.2.4 Strategy 

The goal of the algorithm is to have all the particles locate the optima 
in a multi-dimensional hyper-volume. This is achieved by assigning 
initially random positions to all particles in the space and small initial 
random velocities. The algorithm is executed like a simulation, advancing 
the position of each particle in turn based on its velocity, the best 
known global position in the problem space and the best position 
known to a particle. The objective function is sampled after each 
position update. Over time, through a combination of exploration and 
exploitation of known good positions in the search space, the particles 
cluster or converge together around an optima, or several optima. 

6.2.5 Procedure 

The Particle Swarm Optimization algorithm is comprised of a collection 
of particles that move around the search space influenced by their own 
best past location and the best past location of the whole swarm or a 
close neighbor. Each iteration a particle's velocity is updated using: 
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Vi(t+1) =«<(t)+(ci x rand() x {p\ est - Pl (t))) + 
(c 2 x rand() x (p 9i)est - pi(t))) 

where Vi(t + 1) is the new velocity for the i th particle, c\ and C2 are 
the weighting coefficients for the personal best and global best positions 
respectively, Pi{t) is the i th particle's position at time t, p\ est is the i th 
particle's best known position, and p g best is the best position known to 
the swarm. The rand() function generate a uniformly random variable 
€ [0, 1]. Variants on this update equation consider best positions within 
a particles local neighborhood at time t. 

A particle's position is updated using: 

p i (t+l)=p i {t) + v i {t) (6.1) 

Algorithm 6.2.1 provides a pseudocode listing of the Particle Swarm 
Optimization algorithm for minimizing a cost function. 

6.2.6 Heuristics 

• The number of particles should be low, around 20-40 

• The speed a particle can move (maximum change in its position 
per iteration) should be bounded, such as to a percentage of the 
size of the domain. 

• The learning factors (biases towards global and personal best 
positions) should be between 0 and 4, typically 2. 

• A local bias (local neighborhood) factor can be introduced where 
neighbors are determined based on Euclidean distance between 
particle positions. 

• Particles may leave the boundary of the problem space and may 
be penalized, be reflected back into the domain or biased to return 
back toward a position in the problem domain. Alternatively, a 
wrapping strategy may be used at the edge of the domain creating 
a loop, torrid or related geometrical structures at the chosen 
dimensionality. 

• An inertia or momentum coefficient can be introduced to limit the 
change in velocity. 
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Algorithm 6.2.1: Pseudocode for PSO. 



Input: ProblemSize, Population S i ze 
Output: Pg_ bes t 

1 Population 4— 0; 

2 Pg_best *~ 0; 

3 for i = 1 to Population S i Z& do 



P 



velocity 
position 



RandomVelocity () ; 
RandomPos it ion (Populations 



P 



p_best 



P 



position i 



if Cost(P p _ best ) < Cost(P ff j, est ) then 

| PgJjest ^ Pp_best j 

end 

10 end 

11 while ^StopConditionO do 



12 

13 
14 
15 
16 
17 
18 
19 
20 
21 



foreach P e Population do 

Pvelocity 4~ UpdateVelocity (Pvelocity, PgJbeatt PpJbestJ'i 



Pposition 

if Cost(P 



UpdatePosition(P p , 



positions Pvelocity ) 1 



P 



position 



) < Cost (P. 



P-best 



) then 



p_best 



P 



position 3 



if Cost (P p _best) < Cost (P g _best ) then 

Pg_best ^ Pp_best 3 

end 



end 
end 

22 end 

23 return P fl j, est ; 



6.2.7 Code Listing 

Listing 6.1 provides an example of the Particle Swarm Optimization 
algorithm implemented in the Ruby Programming Language. The 
demonstration problem is an instance of a continuous function optimiza- 
tion that seeks min/(:r) where / = X)"=i x i > ~5.0 < Xj < 5.0 and n = 3. 
The optimal solution for this basin function is (vq, . . . , u n -i) = 0-0- The 
algorithm is a conservative version of Particle Swarm Optimization 
based on the seminal papers. The implementation limits the velocity 
at a pre-defined maximum, and bounds particles to the search space, 
reflecting their movement and velocity if the bounds of the space are 
exceeded. Particles are influenced by the best position found as well 
as their own personal best position. Natural extensions may consider 
limiting velocity with an inertia coefficient and including a neighborhood 
function for the particles. 

i def objective_f unction(vector) 



2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 

35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
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52 
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54 

56 
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return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array . new (minmax . size) do |i| 

minmax[i][0] + ( (minmax [i] [1] - minmax [i] [0] ) * randQ) 

end 
end 

def create_particle(search_space, vel_space) 
particle = {} 

particle [: position] = random_vector (search_space) 
particle [: cost] = objective_f unction(particle [ :position] ) 
particle [ :b_position] = Array .new (particle [: position] ) 
particle [ :b_cost] = particle [: cost] 
particle [: velocity] = random_vector (vel_space) 
return particle 
end 

def get_global_best (population, current_best=nil) 
population. sort!{|x,y I x[:cost] <=> y[:cost]} 
best = population. first 

if current_best .nil? or best [: cost] <= current_best [ : cost] 
current_best = {} 

current_best[:position] = Array .new(best [ :position] ) 
current_best [: cost] = best [: cost] 
end 

return current_best 
end 

def update_velocity (particle, gbest, max_v, cl, c2) 
particle [: velocity] . each_with_index do |v,i| 
vl = cl * randO * (particle [:b_position] [i] - 

particle [: position] [i] ) 
v2 = c2 * randO * (gbest [:position] [i] - particle [ :position] [i] ) 
particle [: velocity] [i] = v + vl + v2 

particle [: velocity] [i] = max_v if particle [: velocity] [i] > max_v 
particle [: velocity] [i] = -max_v if particle [: velocity] [i] < -max_v 
end 
end 

def update_position(part , bounds) 

part [ :position] . each_with_index do |v,i| 
part [: position] [i] = v + part [: velocity] [i] 
if part [: position] [i] > bounds[i][l] 

part [:position] [i] =bounds [i] [1] - (part [: position] [i] -bounds [i] [1] ) . 
part [: velocity] [i] *= -1.0 
elsif part [: position] [i] < bounds[i][0] 

part [: posit ion] [i]=bounds [i] [0] + (part [: position] [i] -bounds [i] [0] ) . 
part [: velocity] [i] *= -1.0 
end 
end 
end 

def update_best_posit ion (particle) 

return if particle [: cost] > particle [ :b_cost] 
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particle [ :b_cost] = particle [: cost] 

particle [ :b_position] = Array .new(particle [ :position] ) 
end 

def search (max_gens , search_space , vel_space, pop_size, max_vel, cl, c2) 
pop = Array .new (pop_size) {create_particle(search_space, vel_space)} 
gbest = get_global_best (pop) 
max_gens .times do I gen I 
pop. each do I particle I 

update_velocity (particle , gbest, max_vel, cl, c2) 
update_position(particle , search_space) 

particle [: cost] = objective_f unction (particle [ :position] ) 
update_best_pos it ion (particle) 
end 

gbest = get_global_best (pop , gbest) 
puts " > gen #{gen+l}, f itness=#{gbest [ : cost] }" 
end 

return gbest 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array .new (problem_size) {|i| [-5, 5]} 

# algorithm configuration 

vel_space = Array .new(problem_size) {|i| [-1, 1]} 
max_gens = 100 
pop_size = 50 
max_vel = 100.0 
cl, c2 = 2.0, 2.0 

# execute the algorithm 

best = search (max_gens , search_space , vel_space, pop_size, max_vel, 
cl,c2) 

puts "done! Solution: f =#{best [ : cost] } , s=#{best [:position] . inspect}" 
end 

Listing 6.1: Particle Swarm Optimization in Ruby 



6.2.8 References 
Primary Sources 

Particle Swarm Optimization was described as a stochastic global op- 
timization method for continuous functions in 1995 by Eberhart and 
Kennedy [1, 3]. This work was motivated as an optimization method 
loosely based on the flocking behavioral models of Reynolds [7] . Early 
works included the introduction of inertia [8] and early study of social 
topologies in the swarm by Kennedy [2]. 

Learn More 

Poli, Kennedy, and Blackwell provide a modern overview of the field 
of PSO with detailed coverage of extensions to the baseline technique 
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[6]. Poli provides a meta-analysis of PSO publications that focus on 
the application the technique, providing a systematic breakdown on 
application areas [5]. An excellent book on Swarm Intelligence in 
general with detailed coverage of Particle Swarm Optimization is "Swarm 
Intelligence" by Kennedy, Eberhart, and Shi [4]. 
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6.3 Ant System 

Ant System, AS, Ant Cycle. 

6.3.1 Taxonomy 

The Ant System algorithm is an example of an Ant Colony Optimiza- 
tion method from the field of Swarm Intelligence, Metaheuristics and 
Computational Intelligence. Ant System was originally the term used to 
refer to a range of Ant based algorithms, where the specific algorithm 
implementation was referred to as Ant Cycle. The so-called Ant Cycle 
algorithm is now canonically referred to as Ant System. The Ant System 
algorithm is the baseline Ant Colony Optimization method for popular 
extensions such as Elite Ant System, Rank-based Ant System, Max-Min 
Ant System, and Ant Colony System. 

6.3.2 Inspiration 

The Ant system algorithm is inspired by the foraging behavior of ants, 
specifically the pheromone communication between ants regarding a 
good path between the colony and a food source in an environment. 
This mechanism is called stigmergy. 

6.3.3 Metaphor 

Ants initially wander randomly around their environment. Once food is 
located an ant will begin laying down pheromone in the environment. 
Numerous trips between the food and the colony are performed and if 
the same route is followed that leads to food then additional pheromone 
is laid down. Pheromone decays in the environment, so that older paths 
are less likely to be followed. Other ants may discover the same path 
to the food and in turn may follow it and also lay down pheromone. 
A positive feedback process routes more and more ants to productive 
paths that are in turn further refined through use. 

6.3.4 Strategy 

The objective of the strategy is to exploit historic and heuristic informa- 
tion to construct candidate solutions and fold the information learned 
from constructing solutions into the history. Solutions are constructed 
one discrete piece at a time in a probabilistic step-wise manner. The 
probability of selecting a component is determined by the heuristic 
contribution of the component to the overall cost of the solution and the 
quality of solutions from which the component has historically known 
to have been included. History is updated proportional to the quality of 
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candidate solutions and is uniformly decreased ensuring the most recent 
and useful information is retained. 



6.3.5 Procedure 

Algorithm 6.3.1 provides a pseudocode listing of the main Ant System 
algorithm for minimizing a cost function. The pheromone update process 
is described by a single equation that combines the contributions of 
all candidate solutions with a decay coefficient to determine the new 
pheromone value, as follows: 

m 

r iJ ^(l-p)xr ilj +J2 A h (6-2) 
k=l 

where Tij represents the pheromone for the component (graph edge) 
p is the decay factor, m is the number of ants, and J^feLi 
is the sum of s 1 - (maximizing solution cost) for those solutions that 
include component The Pseudocode listing shows this equation 
as an equivalent as a two step process of decay followed by update for 
simplicity. 

The probabilistic step-wise construction of solution makes use of 
both history (pheromone) and problem-specific heuristic information to 
incrementally construction a solution piece-by-piece. Each component 
can only be selected if it has not already been chosen (for most com- 
binatorial problems), and for those components that can be selected 
from (given the current component i), their probability for selection is 
defined as: 

r a x rP 

P. j < ^ (6-3) 

l^k=\ 'i,k "i,k 

where is the maximizing contribution to the overall score of 
selecting the component (such as dist l^ ce . f° r the Traveling Salesman 
Problem), a is the heuristic coefficient, tj j is the pheromone value for 
the component, j3 is the history coefficient, and c is the set of usable 
components. 



6.3.6 Heuristics 

• The Ant Systems algorithm was designed for use with combina- 
torial problems such as the TSP, knapsack problem, quadratic 
assignment problems, graph coloring problems and many others. 

• The history coefficient (a) controls the amount of contribution 
history plays in a components probability of selection and is 
commonly set to 1.0. 
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Algorithm 6.3.1: Pseudocode for Ant System. 

Input: ProblemSize, Population S i ze , m, p, a, /3 
Output: P best 

1 Pbest <— CreateHeuristicSolution(ProblemSize) ; 

2 Pbest cost <— Cost (S/J; 

3 Pheromone ■<— InitializePheromone(P6esi cost ) ; 

4 while ^StopConditionO do 

5 Candidates <— 0; 

6 for i = 1 to m do 

7 Si I— ProbabilisticStepwiseConstruction(Pheromone, 
ProblemSize, a, f3) ; 

s Sicost <- Cost(5j); 

9 if ^cost < Pbest cost then 

10 Pbest cos t <- Sicosu 

12 end 

13 Candidates 4- S*^ 

14 end 

15 DecayPheromone (Pheromone. p); 

16 foreach Si G Candidates do 

17 | UpdatePheromone (Pheromone, Si, Si cos t) ; 
is end 

19 end 

20 return P best ; 



• The heuristic coefficient (J3) controls the amount of contribution 
problem-specific heuristic information plays in a components prob- 
ability of selection and is commonly between 2 and 5, such as 
2.5. 

• The decay factor (p) controls the rate at which historic information 
is lost and is commonly set to 0.5. 

• The total number of ants (m) is commonly set to the number of 
components in the problem, such as the number of cities in the 
TSP. 

6.3.7 Code Listing 

Listing 6.2 provides an example of the Ant System algorithm imple- 
mented in the Ruby Programming Language. The algorithm is applied 
to the Berlin52 instance of the Traveling Salesman Problem (TSP), 
taken from the TSPLIB. The problem seeks a permutation of the order 
to visit cities (called a tour) that minimized the total distance traveled. 
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The optimal tour distance for Berlin52 instance is 7542 units. Some 
extensions to the algorithm implementation for speed improvements 
may consider pre-calculating a distance matrix for all the cities in the 
problem, and pre-computing a probability matrix for choices during the 
probabilistic step-wise construction of tours. 



def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2 [0] ) **2 . 0 + (cl[l] - c2 [1] ) **2 . 0) .round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array .new(cities . size){ I i I i} 
perm. each_index do |i| 

r = rand(perm. size-i) + i 

perm[r] , perm[i] = perm[i], perm[r] 
end 

return perm 
end 

def initialise_pheromone_matrix(num_cities, naive_score) 
v = num_cities .to_f / naive_score 

return Array .new(num_cities){ I i I Array .new(num_cities , v)} 
end 

def calculate_choices (cities, last_city, exclude, pheromone, c_heur, 
c_hist) 
choices = [] 

cities . each_with_index do I coord, i| 
next if exclude . include? (i) 
prob = {:city=>i} 

prob[: history] = pheromone [last_city] [i] ** c_hist 
prob [: distance] = euc_2d(cities [last_city] , coord) 
prob[:heuristic] = (1 . 0/prob [: distance] ) ** c_heur 
prob [: prob] = prob [: history] * prob [: heuristic] 
choices << prob 
end 

choices 
end 

def select_next_city (choices) 

sum = choices . inject (0 . 0) { I sum, element I sum + element [: prob] } 
return choices [rand(choices . size)] [: city] if sum == 0.0 
v = randO 

choices . each_with_index do I choice, i| 
v -= (choice [:prob] /sum) 
return choice [: city] if v <= 0.0 
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end 

return choices . last [: city] 
end 

def stepwise_const (cities, phero, c_heur, c_hist) 
perm = [] 

perm « rand(cities . size) 
begin 

choices = 

calculate_choices (cities , perm . last , perm , phero , c_heur , c_hist) 
next_city = select_next_city (choices) 
perm << next_city 
end until perm. size == cities. size 
return perm 
end 

def decay _pheromone (pheromone , decay_f actor) 
pheromone . each do I array I 

array . each_with_index do |p, i| 

array [i] = (1.0 - decay_f actor) * p 
end 
end 
end 

def update_pheromone (pheromone , solutions) 
solutions . each do I other I 

other [: vector] . each_with_index do |x, i| 

y=(i==other[:vector] .size-1) ? other [: vector] [0] : 

other [: vector] [i+1] 
pheromone [x] [y] += (1.0 / other [: cost] ) 
pheromone [y] [x] += (1.0 / other [: cost] ) 
end 
end 
end 

def search(cities , max_it, num_ants, decay_f actor , c_heur, c_hist) 
best = -[:vector=>random_permutation(cities)} 
best [: cost] = cost (best [: vector] , cities) 

pheromone = initialise_pheromone_matrix(cities . size , best[:cost]) 
max_it. times do I iter I 

solutions = [] 

num_ ants .times do 
candidate = {} 

candidate [: vector] = stepwise_const (cities , pheromone, c_heur, 
c_hist) 

candidate [: cost] = cost (candidate [: vector] , cities) 
best = candidate if candidate [: cost] < best [: cost] 
solutions << candidate 
end 

decay_pheromone (pheromone , decay_f actor) 
update_pheromone (pheromone, solutions) 
puts " > iteration #{(iter+l)}, best=#{best [ : cost] }" 
end 

return best 
end 
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if __file__ == $o 

# problem configuration 

berlin52 = [ [565 ,575] , [25 , 185] , [345 ,750] , [945 ,685] , [845 ,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] . 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] . 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830,610] , [605,625] , [595,360] , [1340,725] , [1740,245]] 

# algorithm configuration 
max_it = 50 

niun_ants = 30 
decay_f actor = 0.6 
c_heur =2.5 
c_hist =1.0 

# execute the algorithm 

best = search(berlin52 , max_it , num_ants, decay_f actor , c_heur, 
c_hlst) 

puts "Done. Best Solution: c=#{best [ : cost] } , 
v=#{best [: vector] .inspect}" 

end 



Listing 6.2: Ant System in Ruby 



6.3.8 References 
Primary Sources 

The Ant System was described by Dorigo, Maniezzo, and Colorni in 
an early technical report as a class of algorithms and was applied to a 
number of standard combinatorial optimization algorithms [4] . A series 
of technical reports at this time investigated the class of algorithms 
called Ant System and the specific implementation called Ant Cycle. 
This effort contributed to Dorigo's PhD thesis published in Italian [2]. 
The seminal publication into the investigation of Ant System (with the 
implementation still referred to as Ant Cycle) was by Dorigo in 1996 [3]. 



Learn More 

The seminal book on Ant Colony Optimization in general with a detailed 
treatment of Ant system is "Ant colony optimization" by Dorigo and 
Stutzle [5]. An earlier book "Swarm intelligence: from natural to 
artificial systems" by Bonabeau, Dorigo, and Theraulaz also provides 
an introduction to Swarm Intelligence with a detailed treatment of Ant 
System [1]. 
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6.4 Ant Colony System 

Ant Colony System, ACS, Ant-Q. 

6.4.1 Taxonomy 

The Ant Colony System algorithm is an example of an Ant Colony Op- 
timization method from the field of Swarm Intelligence, Mctahcuristics 
and Computational Intelligence. Ant Colony System is an extension 
to the Ant System algorithm and is related to other Ant Colony Op- 
timization methods such as Elite Ant System, and Rank-based Ant 
System. 

6.4.2 Inspiration 

The Ant Colony System algorithm is inspired by the foraging behavior of 
ants, specifically the pheromone communication between ants regarding 
a good path between the colony and a food source in an environment. 
This mechanism is called stigmergy. 

6.4.3 Metaphor 

Ants initially wander randomly around their environment. Once food is 
located an ant will begin laying down pheromone in the environment. 
Numerous trips between the food and the colony are performed and if 
the same route is followed that leads to food then additional pheromone 
is laid down. Pheromone decays in the environment, so that older paths 
are less likely to be followed. Other ants may discover the same path 
to the food and in turn may follow it and also lay down pheromone. 
A positive feedback process routes more and more ants to productive 
paths that are in turn further refined through use. 

6.4.4 Strategy 

The objective of the strategy is to exploit historic and heuristic informa- 
tion to construct candidate solutions and fold the information learned 
from constructing solutions into the history. Solutions are constructed 
one discrete piece at a time in a probabilistic step-wise manner. The 
probability of selecting a component is determined by the heuristic 
contribution of the component to the overall cost of the solution and the 
quality of solutions from which the component has historically known 
to have been included. History is updated proportional to the quality 
of the best known solution and is decreased proportional to the usage if 
discrete solution components. 
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6.4.5 Procedure 

Algorithm 6.4.1 provides a pseudocode listing of the main Ant Colony 
System algorithm for minimizing a cost function. The probabilistic step- 
wise construction of solution makes use of both history (pheromone) 
and problem-specific heuristic information to incrementally construct 
a solution piece- by-piece. Each component can only be selected if it 
has not already been chosen (for most combinatorial problems), and for 
those components that can be selected from given the current component 
i, their probability for selection is defined as: 

T a x rP 

Pij< iA ^Sr (M 

Z^fc=l 'i,k 'h,k 

where rjij is the maximizing contribution to the overall score of 
selecting the component (such as dist l'° ce . for the Traveling Salesman 
Problem), (3 is the heuristic coefficient (commonly fixed at 1.0), Tij is 
the pheromone value for the component, a is the history coefficient, and 
c is the set of usable components. A greediness factor (qO) is used to 
influence when to use the above probabilistic component selection and 
when to greedily select the best possible component. 

A local pheromone update is performed for each solution that is 
constructed to dissuade following solutions to use the same components 
in the same order, as follows: 

n,j <- (1 - a) X t uj + a x (6.5) 

where Tij represents the pheromone for the component (graph edge) 
u is the local pheromone factor, and t®j is the initial pheromone 

value. 

At the end of each iteration, the pheromone is updated and decayed 
using the best candidate solution found thus far (or the best candidate 
solution found for the iteration), as follows: 



nj <- (1 - p) x nj + p x Arz, j (6.6) 

where T^j represents the pheromone for the component (graph edge) 
(i,j), p is the decay factor, and Ari,j is the maximizing solution cost 
for the best solution found so far if the component ij is used in the 
globally best known solution, otherwise it is 0. 



6.4.6 Heuristics 

• The Ant Colony System algorithm was designed for use with com- 
binatorial problems such as the TSP, knapsack problem, quadratic 
assignment problems, graph coloring problems and many others. 
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Algorithm 6.4.1: Pseudocode for Ant Colony System. 
Input: ProblemSize, Population S i ze , to, p, /3, a, qO 
Output: P best 

1 Pbest <— CreateHeuristicSolution(ProblemSize) ; 

2 Pbest cost <— Cost (Sh); 

3 Phe.roraoneir.it <— 5— 7-, — c . 1 ;° D , — ; : 

o tan ProblemSizexPoest cos t ' 

4 Pheromone <— InitializePheromone (Pheromonei n i t ) ; 

5 while ^StopConditionO do 
for i — 1 to to do 

Si 4— ConstructSolution(Pheromone, ProblemSize, /3, 

30); 

Sicost <- Cost (Si); 
if Si cost < Pbest cost then 
Pbest cos i A Si cos t\ 

Pbest ^~ Si] 

end 

LocalUpdateAndDecayPheromone( Pheromone, Si, Si cos t, 

<0; 

end 

GlobalUpdateAndDecayPheromoneC Pheromone, P bes t, 

Pbestcost, p); 

16 end 

17 return P bes u 



6 
7 

8 
9 
10 
11 
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13 

14 
15 



• The local pheromone (history) coefficient (a) controls the amount 
of contribution history plays in a components probability of selec- 
tion and is commonly set to 0.1. 

• The heuristic coefficient (/?) controls the amount of contribution 
problem-specific heuristic information plays in a components prob- 
ability of selection and is commonly between 2 and 5, such as 
2.5. 

• The decay factor (p) controls the rate at which historic information 
is lost and is commonly set to 0.1. 

• The greediness factor (qO) is commonly set to 0.9. 

• The total number of ants (to) is commonly set low, such as 10. 

6.4.7 Code Listing 



Listing 6.3 provides an example of the Ant Colony System algorithm 
implemented in the Ruby Programming Language. The algorithm is 
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applied to the Berlin52 instance of the Traveling Salesman Problem 
(TSP), taken from the TSPLIB. The problem seeks a permutation 
of the order to visit cities (called a tour) that minimized the total 
distance traveled. The optimal tour distance for Berlin52 instance is 
7542 units. Some extensions to the algorithm implementation for speed 
improvements may consider pre-calculating a distance matrix for all 
the cities in the problem, and pre-computing a probability matrix for 
choices during the probabilistic step- wise construction of tours. 

def euc_2d(cl, c2) 

Math.sqrt((cl[0] - c2[0])**2.0 + (cl[l] - c2 [1] ) **2 . 0) . round 
end 

def cost (permutation, cities) 
distance =0 

permutation. each_with_index do |cl, i| 

c2 = (i==permutation. size-1) ? permutation [0] : permutation [i+1] 

distance += euc_2d(cities [cl] , cities [c2] ) 
end 

return distance 
end 

def random_permutation(cities) 

perm = Array .new(cities . size) { I i I i} 
perm. each_index do |i| 

r = rand(perm. size-i) + i 

perm[r], perm[i] = perm[i] , perm[r] 
end 

return perm 
end 

def initialise_pheromone_matrix(num_cities , init_pher) 

return Array .new(num_cities){ I i I Array. new(num_cities, init_pher)} 
end 

def calculate_choices (cities, last_city, exclude, pheromone, c_heur, 
c_hist) 
choices = [] 

cities . each_with_index do I coord, i| 
next if exclude . include? (i) 
prob = {:city=>i} 

prob [: history] = pheromone [last_city] [i] ** c_hist 
prob [: distance] = euc_2d(cities [last_city] , coord) 
prob [: heuristic] = (1 . 0/prob [: distance] ) ** c_heur 
prob[:prob] = prob [: history] * prob [ :heuristic] 
choices << prob 
end 

return choices 
end 

def prob_select (choices) 

sum = choices . inject (0 . 0) { I sum, element I sum + element [: prob] } 
return choices [rand(choices . size)] [: city] if sum == 0.0 
v = randQ 

choices . each_with_index do I choice, i| 
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v -= (choice [:prob] /sum) 
return choice [: city] if v <= 0.0 
end 

return choices . last [: city] 
end 

def greedy_select (choices) 

return choices .max{ I a,b I a [ :prob] <=>b [ :prob] } [ : city] 
end 

def stepwise_const (cities, phero, c_heur, c_greed) 
perm = [] 

perm « rand(cities . size) 
begin 

choices = calculate_choices(cities, perm. last, perm, phero, c_heur, 
1.0) 

greedy = randO <= c_greed 

next_city = (greedy) ? greedy_select (choices) : prob_select (choices) 

perm << next_city 
end until perm. size == cities. size 
return perm 
end 

def global_update_pheromone (phero , cand, decay) 
cand[ : vector] . each_with_index do |x, i| 

y = (i==cand[: vector] .size-1) ? cand [: vector] [0] : 

cand[:vector] [i+1] 
value = ( (1 .0-decay)*phero [x] [y] ) + (decay* (1 . 0/cand [: cost] ) ) 
phero [x] [y] = value 
phero [y] [x] = value 
end 
end 

def local_update_pheromone(pheromone, cand, c_local_phero , init_phero) 
cand [: vector] . each_with_index do |x, i| 
y = (i==cand[ : vector] . size-1) ? cand [: vector] [0] : 

cand[:vector] [i+1] 
value = 

( (1 . 0-c_local_phero) *pheromone [x] [y] ) +(c_local_phero*init_phero) 
pheromone [x] [y] = value 
pheromone [y] [x] = value 
end 
end 

def search(cities, max_it, num_ants, decay, c_heur, c_local_phero , 
c_greed) 

best = {:vector=>random_permutation(cities)} 

best [: cost] = cost (best [: vector] , cities) 

init_pheromone = 1.0 / (cities . size . to_f * best [: cost]) 

pheromone = initialise_pheromone_matrix(cities . size, init_pheromone) 

max_it. times do I iter I 

solutions = [] 

num_ants .times do 
cand = {} 

cand[:vector] = stepwise_const (cities , pheromone, c_heur, c_greed) 
cand [: cost] = cost (cand[ : vector] , cities) 
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best = cand if cand[:cost] < best [: cost] 
local_update_pheromone (pheromone , cand, c_local_phero , 
init _pher omone ) 

end 

global_update_pheromone (pheromone , best, decay) 
puts " > iteration #{(iter+l)}, best=#{best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 

berlin52 = [ [565 ,575] , [25 , 185] , [345 ,750] , [945 , 685] , [845 ,655] , 
[880,660] , [25,230] , [525,1000] , [580,1175] , [650,1130] , [1605,620] . 
[1220,580] , [1465,200] , [1530,5] , [845,680] , [725,370] , [145,665] , 
[415,635] , [510,875] , [560,365] , [300,465] , [520,585] , [480,415] , 
[835,625] , [975,580] , [1215,245] , [1320,315] , [1250,400] , [660,180] . 
[410,250] , [420,555] , [575,665] , [1150,1160] , [700,580] , [685,595] , 
[685,610] , [770,610] , [795,645] , [720,635] , [760,650] , [475,960] , 
[95,260] , [875,920] , [700,500] , [555,815] , [830,485] , [1170,65] , 
[830 , 610] , [605 , 625] , [595 , 360] , [1340 , 725] , [1740 , 245] ] 

# algorithm configuration 
max_it = 100 

num_ants = 10 
decay =0.1 
c_heur = 2.5 
c_local_phero =0.1 
c_greed = 0.9 

# execute the algorithm 

best = search (berlin52, max_it , num_ants, decay, c_heur, 

c_local_phero , c_greed) 
puts "Done. Best Solution: c=#{best [ : cost] } , 

v=#{best [: vector] .inspect}" 

end 



Listing 6.3: Ant Colony System in Ruby 



6.4.8 References 
Primary Sources 

The algorithm was initially investigated by Dorigo and Gambardella 
under the name Ant-Q [2, 6]. It was renamed Ant Colony System and 
further investigated first in a technical report by Dorigo and Gambardella 
[4], and later published [3]. 



Learn More 

The seminal book on Ant Colony Optimization in general with a detailed 
treatment of Ant Colony System is "Ant colony optimization" by Dorigo 
and Stiitzle [5]. An earlier book "Swarm intelligence: from natural to 
artificial systems" by Bonabeau, Dorigo, and Theraulaz also provides 
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an introduction to Swarm Intelligence with a detailed treatment of Ant 
Colony System [1]. 
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6.5 Bees Algorithm 

Bees Algorithm, BA. 

6.5.1 Taxonomy 

The Bees Algorithm beings to Bee Inspired Algorithms and the field 
of Swarm Intelligence, and more broadly the fields of Computational 
Intelligence and Metaheuristics. The Bees Algorithm is related to other 
Bee Inspired Algorithms, such as Bee Colony Optimization, and other 
Swarm Intelligence algorithms such as Ant Colony Optimization and 
Particle Swarm Optimization. 

6.5.2 Inspiration 

The Bees Algorithm is inspired by the foraging behavior of honey bees. 
Honey bees collect nectar from vast areas around their hive (more than 
10 kilometers). Bee Colonies have been observed to send bees to collect 
nectar from flower patches relative to the amount of food available at 
each patch. Bees communicate with each other at the hive via a waggle 
dance that informs other bees in the hive as to the direction, distance, 
and quality rating of food sources. 

6.5.3 Metaphor 

Honey bees collect nectar from flower patches as a food source for the 
hive. The hive sends out scout's that locate patches of flowers, who then 
return to the hive and inform other bees about the fitness and location 
of a food source via a waggle dance. The scout returns to the flower 
patch with follower bees. A small number of scouts continue to search 
for new patches, while bees returning from flower patches continue to 
communicate the quality of the patch. 

6.5.4 Strategy 

The information processing objective of the algorithm is to locate and 
explore good sites within a problem search space. Scouts are sent out 
to randomly sample the problem space and locate good sites. The good 
sites are exploited via the application of a local search, where a small 
number of good sites are explored more than the others. Good sites are 
continually exploited, although many scouts are sent out each iteration 
always in search of additional good sites. 
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6.5.5 Procedure 

Algorithm 6.5.1 provides a pseudocode listing of the Bees Algorithm for 
minimizing a cost function. 



Algorithm 6.5.1: Pseudocode for the Bees Algorithm. 



Input: Problem S ize, Bees num , Sites r . 



EliteSiteSr, 



, OtherBees r 



PatchSizeinit, EliteBees num 
Output: Beebest 

1 Population <— InitializePopulation(_Bees 

2 while ^StopConditionO do 



Problem size ): 



EvaluatePopulation(Population) ; 
Beebest <— GetBestSolution(Population) ; 
NextGeneration <— 0; 

Patch s i ze <— ( PatchSizeinit x PatchDecreasef ac tor)', 
Sitesb es t SelectBestSites (Population, Sites num ); 
foreach Site-i £ Sitesbest do 
RecruitedBees num 4— 0; 
if i < EliteSites num then 
| RecruitedBees num 4— EliteBees num ; 
else 

| RecruitedBees num 4— Other Bees num ; 
end 

Neighborhood 4- 0; 
for j to RecruitedBees num do 

Neighborhood 4— CreateNeighborhoodBee (Sitei, 



GetBestSolution(Neighborhood) ; 

Sites nU ni) j 



v s%ze 

end 

NextGeneration 
end 

Remaining Bees num 4— (Bees num 
for j to Remaining Bees num do 

| NextGeneration <— CreateRandomBee () ; 
end 

Population 4— NextGeneration; 

26 end 

27 return Beebest', 



6.5.6 Heuristics 

• The Bees Algorithm was developed to be used with continuous 
and combinatorial function optimization problems. 

• The Patch size variable is used as the neighborhood size. For 
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example, in a continuous function optimization problem, each 
dimension of a site would be sampled as Xi ± (rand() x Patch S i ze ). 

• The Patch size variable is decreased each iteration, typically by a 
constant amount (such as 0.95). 

• The number of elite sites {EliteSites num ) must be < the number 
of sites (SiteSnum), and the number of elite bees (EliteBees num ) 
is traditionally < the number of other bees {Other Bees num ). 

6.5.7 Code Listing 

Listing 6.4 provides an example of the Bees Algorithm implemented 
in the Ruby Programming Language. The demonstration problem is 
an instance of a continuous function optimization that seeks min/(a;) 
where / = Y^i=i x ii ~ 5.0 < Xi < 5.0 and n — 3. The optimal solution 
for this basin function is (i>o, • • • ,u n _i) = 0.0. The algorithm is an 
implementation of the Bees Algorithm as described in the seminal paper 
[2]. A fixed patch size decrease factor of 0.95 was applied each iteration. 

def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 

return Array .new(minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def create_random_bee (search_space) 

return { : vector=>random_vector (search_space)} 
end 

def create_neigh_bee (site , patch_size, search_space) 
vector = [] 

site . each_with_index do |v,i| 

v = (rand()<0.5) ? v+randO *patch_size : v-randO *patch_size 

v = search_space [i] [0] if v < search_space [i] [0] 

v = search_space [i] [1] if v > search_space [i] [1] 

vector << v 
end 

bee = {} 

bee [: vector] = vector 
return bee 
end 

def search_neigh (parent , neigh_size, patch_size, search_space) 
neigh = [] 
neigh_size . times do 

neigh << create_neigh_bee (parent [: vector] , patch_size, search_space) 
end 

neigh. each{ I bee I bee [: fitness] = objective_f unction(bee [: vector] )} 
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return neigh. sort{ I x,y I x [: fitness] <=>y [: fitness] }. first 
end 

def create_scout_bees(search_space, num_scouts) 

return Array .new (num_ scouts) do 
create_random_bee (search_space) 

end 
end 

def search (max_gens, search_space , num_bees, num_sites, elite_sites, 
patch_size, e_bees, o_bees) 
best = nil 

pop = Array. new(num_bees){ create_random_bee (search_space) } 
max_gens .times do I gen I 

pop . each{ I bee I bee [: fitness] = objective_f unction (bee [: vector] ) } 

pop . sort ! { I x,y I x [ : fitness] <=>y [ : fitness] } 

best = pop. first if best. nil? or pop . first [: fitness] < 
best [ : fitness] 

next_gen = [] 

pop [0 . . .num_sites] . each_with_index do Iparent, i| 
neigh_size = (i<elite_sites) ? e_bees : o_bees 
next_gen << search_neigh(parent , neigh_size, patch_size, 
search_space) 

end 

scouts = create_scout_bees(search_space, (num_bees-num_sites) ) 
pop = next_gen + scouts 
patch_size = patch_size * 0.95 

puts " > it=#{gen+l}, patch_size=#{patch_size} , f =#{best [: fitness] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 3 

search_space = Array .new (problem_size) {|i| [-5, 5]} 

# algorithm configuration 
max_gens = 500 
num_bees = 45 
num_sites = 3 
elite_sites = 1 
patch_size = 3.0 

e_bees = 7 
o_bees = 2 

# execute the algorithm 

best = search (max_gens, search_space , num_bees, num_sites, 

elite_sites, patch_size, e_bees, o_bees) 
puts "done! Solution: f =#{best [: fitness] } , s=#{best [: vector] . inspect}" 
end 



Listing 6.4: Bees Algorithm in Ruby 
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6.5.8 References 
Primary Sources 

The Bees Algorithm was proposed by Pham et al. in a technical report in 
2005 [3], and later published [2]. In this work, the algorithm was applied 
to standard instances of continuous function optimization problems. 

Learn More 

The majority of the work on the algorithm has concerned its application 
to various problem domains. The following is a selection of popular 
application papers: the optimization of linear antenna arrays by Guney 
and Onay [1], the optimization of codebook vectors in the Learning 
Vector Quantization algorithm for classification by Pham et al. [5], 
optimization of neural networks for classification by Pham et al. [6] , and 
the optimization of clustering methods by Pham et al. [4]. 
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6.6 Bacterial Foraging Optimization Algo- 
rithm 

Bacterial Foraging Optimization Algorithm, BFOA, Bacterial Foraging 
Optimization, BFO. 



6.6.1 Taxonomy 

The Bacterial Foraging Optimization Algorithm belongs to the field of 
Bacteria Optimization Algorithms and Swarm Optimization, and more 
broadly to the fields of Computational Intelligence and Metaheuristics. It 
is related to other Bacteria Optimization Algorithms such as the Bacteria 
Chemotaxis Algorithm [3] , and other Swarm Intelligence algorithms such 
as Ant Colony Optimization and Particle Swarm Optimization. There 
have been many extensions of the approach that attempt to hybridize 
the algorithm with other Computational Intelligence algorithms and 
Metaheuristics such as Particle Swarm Optimization, Genetic Algorithm, 
and Tabu Search. 



6.6.2 Inspiration 

The Bacterial Foraging Optimization Algorithm is inspired by the group 
foraging behavior of bacteria such as E.coli and M.xanthus. Specifically, 
the BFOA is inspired by the chemotaxis behavior of bacteria that will 
perceive chemical gradients in the environment (such as nutrients) and 
move toward or away from specific signals. 



6.6.3 Metaphor 

Bacteria perceive the direction to food based on the gradients of chem- 
icals in their environment. Similarly, bacteria secrete attracting and 
repelling chemicals into the environment and can perceive each other in 
a similar way. Using locomotion mechanisms (such as flagella) bacteria 
can move around in their environment, sometimes moving chaotically 
(tumbling and spinning), and other times moving in a directed manner 
that may be referred to as swimming. Bacterial cells are treated like 
agents in an environment, using their perception of food and other cells 
as motivation to move, and stochastic tumbling and swimming like 
movement to re-locate. Depending on the cell-cell interactions, cells 
may swarm a food source, and/or may aggressively repel or ignore each 
other. 
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6.6.4 Strategy 

The information processing strategy of the algorithm is to allow cells to 
stochastically and collectively swarm toward optima. This is achieved 
through a series of three processes on a population of simulated cells: 1) 
'Chemotaxis' where the cost of cells is derated by the proximity to other 
cells and cells move along the manipulated cost surface one at a time 
(the majority of the work of the algorithm), 2) 'Reproduction' where 
only those cells that performed well over their lifetime may contribute 
to the next generation, and 3) 'Elimination-dispersal' where cells are 
discarded and new random samples are inserted with a low probability. 



6.6.5 Procedure 

Algorithm 6.6.1 provides a pseudocode listing of the Bacterial Foraging 
Optimization Algorithm for minimizing a cost function. Algorithm 6.6.2 
provides the pseudocode listing for the chemotaxis and swing behaviour 
of the BFOA algorithm. A bacteria cost is derated by its interaction 
with other cells. This interaction function (gQ) is calculated as follows: 



g(cellk) =^~] - d aUr x expi - w attr x ^ [cell k m - other l m ) 
i=\ L ^ m=l 

S r , P 

h repe i x exp ( - w repe i x ^ cell 1 ^ - other. 



i=l 



i \2 



where cellk is a given cell, d a ttr and w a tt r are attraction coefficients, 
hrepei an d Wrepei are repulsion coefficients, S is the number of cells in 
the population, P is the number of dimensions on a given cells position 
vector. 

The remaining parameters of the algorithm are as follows Cells num 
is the number of cells maintained in the population, N ec i is the number 
of elimination-dispersal steps, N re is the number of reproduction steps, 
N c is the number of chemotaxis steps, N s is the number of swim steps 
for a given cell, Step S i ze is a random direction vector with the same 
number of dimensions as the problem space, and each value € [—1,1], 
and P e d is the probability of a cell being subjected to elimination and 
dispersal. 



6.6.6 Heuristics 

• The algorithm was designed for application to continuous function 
optimization problem domains. 
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Algorithm 6.6.1: Pseudocode for the BFOA. 



Input: Problerrisize, Cells r , 



N re , N c 



N s , Step s 



dattract? attract j hrepellant, ^repellant, Ped 

Output: Cellbest 

1 Population InitializePopulation(CeZZs nmm , Problem s i ze )\ 

2 for I = 0 to N ec i do 
for k = 0 to N re do 

for j = 0 to N c do 

ChemotaxisAndSwim ( Popu lation, Problem s i ze , 

Cells num , N s , Step size , d a ttractr ^attract; hrepellant; 
^repellant ) j 

foreach Cell e Population do 

if Cost (Cell) < Cost(CeZZ 6est ) then 

Cell bes t <- Cell; 
end 
end 
end 

SortByCellHealth(Population) ; 
Selected <- SelectByCellHealth(Population, 
Population <— Selected; 
Population «- Selected; 
end 

foreach Cell e Population do 
if RandO < P ed then 

j Cell <— CreateCellAtRandomLocationO ; 
end 
end 

22 end 

23 return Cellbest', 



Cells 



); 



• Given the loops in the algorithm, it can be configured numerous 
ways to elicit different search behavior. It is common to have a 
large number of chemotaxis iterations, and small numbers of the 
other iterations. 

• The default coefficients for swarming behavior (cell-cell interac- 
tions) arc as follows d attract = 0.1, w attract = 0.2, h repe u ant = 

dattracti arL d W rep ellant = 10. 

• The step size is commonly a small fraction of the search space, 
such as 0.1. 

• During reproduction, typically half the population with a low 
health metric are discarded, and two copies of each member from 
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Algorithm 6.6.2: Pseudocode for the ChemotaxisAndSwim func- 
tion. 



N s , Step 



size-, ^attract, 



Input: Population, Problem S i ze , Cells num , 

^attract j forepellant, ^repellant 

i foreach Cell € Population do 

Cell fit ness <— Cost (Cell) + Interaction(Cell. Population, 

^attract; ^attract; krepellanti ^repellant) 5 
Cellhealth 4— Cell fitness] 

Cell' «- 0; 
for i = 0 to N s do 

RandomStepDirection <— CreateStep(Pro6^em size ) ; 
CeZZ' 4- TakeStep(RandomStepDirection, Step S i ze ); 
Cell' fitness Cost (.Cell') + Interaction(CeZZ', 

Population, d a ttract, W attract) hrepellant, W re peUant) 

if Cell' fitness > Cell fitness then 

i <- N s ; 
else 

Cell <- Cell'; 

Cellhealth <~ Cellhealth + C 'ell' fitness', 

end 
end 
16 end 



the first (high-health) half of the population are retained. 

• The probability of elimination and dispersal (p e d) is commonly set 
quite large, such as 0.25. 

6.6.7 Code Listing 

Listing 6.5 provides an example of the Bacterial Foraging Optimization 
Algorithm implemented in the Ruby Programming Language. The 
demonstration problem is an instance of a continuous function optimiza- 
tion that seeks min/(x) where / = X)i=i x i> ~5-0 < %i < 5.0 and n = 2. 
The optimal solution for this basin function is (vq, . . . , tVi-i) — 0-0- The 
algorithm is an implementation based on the description on the seminal 
work [4]. The parameters for cell-cell interactions (attraction and repul- 
sion) were taken from the paper, and the various loop parameters were 
taken from the 'Swarming Effects' example. 

def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x ** 2.0)} 
end 

def random_vector (minmax) 
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return Array . new (minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randQ) 
end 
end 

def generate_random_direction(problem_size) 

bounds = Array. new(problem_size){ [-1 . 0,1.0]} 

return random_vector (bounds) 
end 

def compute_cell_interaction(cell, cells, d, w) 
sum =0.0 

cells. each do I other I 
diff =0.0 

cell [: vector] . each_index do |i| 

diff += (cell [: vector] [i] - other [: vector] [i] ) **2 . 0 
end 

sum += d * Math.exp(w * diff) 
end 

return sum 
end 

def attract_repel(cell, cells, d_attr, w_attr, h_rep, w_rep) 

attract = compute_cell_interaction(cell , cells, -d_attr, -w_attr) 
repel = compute_cell_interaction(cell , cells, h_rep, -w_rep) 
return attract + repel 

end 

def evaluate(cell, cells, d_attr, w_attr, h_rep, w_rep) 
cell [: cost] = objective_f unction(cell [: vector] ) 
cell[:inter] = attract_repel(cell, cells, d_attr, w_attr, h_rep, 
w_rep) 

cell [: fitness] = cell[:cost] + cell[:inter] 
end 

def tumble_cell (search_space , cell, step_size) 

step = generate_random_direction(search_space . size) 

vector = Array .new(search_space . size) 

vector . each_index do |i| 

vector[i] = cell [: vector] [i] + step_size * step[i] 

vector [i] = search_space [i] [0] if vector [i] < search_space [i] [0] 

vector [i] = search_space [i] [1] if vector [i] > search_space [i] [1] 

end 

return {:vector=>vector} 
end 

def chemotaxis(cells , search_space , chem_steps, swim_length, step_size, 
d_attr, w_attr, h_rep, w_rep) 
best = nil 

chem_steps . times do |j| 
moved_cells = [] 

cells . each_with_index do I cell, i| 
sum_nutrients = 0.0 

evaluate (cell , cells, d_attr, w_attr, h_rep, w_rep) 
best = cell if best. nil? or cell [: cost] < best [: cost] 
sum_nutrients += cell [ :f itness] 
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swim_length. times do I ml 
new_cell = tumble_cell (search_space , cell, step_size) 
evaluate (new_cell , cells, d_attr, w_attr, h_rep, w_rep) 
best = cell if cell [: cost] < best [: cost] 
break if new_cell [: fitness] > cell [: fitness] 
cell = new_cell 

sum_nutrients += cell [: fitness] 
end 

cell[:sum_nutrients] = sum_nutrients 
moved_cells << cell 
end 

puts " » chemo=#{j}, f =#{best [: fitness] } , cost=#{best [ : cost] }" 
cells = moved_cells 
end 

return [best, cells] 
end 

def search(search_space , pop_size, elim_disp_steps , repro_steps, 

chem_steps, swim_length, step_size, d_attr, w_attr, h_rep, w_rep, 
p_eliminate) 

cells = Array .new(pop_size) { { : vector=>random_vector (search_space)} } 
best = nil 

elim_disp_steps . times do 111 
repro_steps .times do |k| 

c_best, cells = chemotaxis(cells , search_space , chem_steps , 

swim_length, step_size, d_attr, w_attr, h_rep, w_rep) 
best = c_best if best. nil? or c_best [ : cost] < best [: cost] 
puts " > best f itness=#{best [:f itness] }, cost=#{best [ : cost] }" 
cells . sort{ |x,y I x [ : sum_nutrients] <=>y [ : sum_nutrients] } 
cells = cells. first(pop_size/2) + cells . first (pop_size/2) 
end 

cells. each do I cell I 

if randQ <= p_eliminate 

cell [: vector] = random_vector (search_space) 
end 
end 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array .new(problem_size) {|i| [-5, 5]} 

# algorithm configuration 
pop_size = 50 
step_size = 0.1 # Ci 
elim_disp_steps = 1 # Ned 
repro_steps = 4 # Nre 
chem_steps = 70 # Nc 
swim_length = 4 # Ns 
p_eliminate = 0.25 # Fed 
d.attr =0.1 

w_attr =0.2 
h_rep = d_attr 
w_rep = 10 
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# execute the algorithm 

best = search(search_space , pop_size, elim_disp_steps , repro_steps, 
chem_steps , swim_length, step_size, d_attr, w_attr, h_rep, w_rep, 
p_eliminate) 

puts "done! Solution: c=#{best [ : cost] } , v=#{best [: vector] . inspect}" 
end 



Listing 6.5: Bacterial Foraging Optimization Algorithm in Ruby 



6.6.8 References 
Primary Sources 

Early work by Liu and Passino considered models of chemotaxis as 
optimization for both E.coli and M.xanthus which were applied to 
continuous function optimization [2]. This work was consolidated by 
Passino who presented the Bacterial Foraging Optimization Algorithm 
that included a detailed presentation of the algorithm, heuristics for 
configuration, and demonstration applications and behavior dynamics 
[4]- 

Learn More 

A detailed summary of social foraging and the BFOA is provided in 
the book by Passino [5]. Passino provides a follow-up review of the 
background models of chemotaxis as optimization and describes the 
equations of the Bacterial Foraging Optimization Algorithm in detail in 
a Journal article [6] . Das et al. present the algorithm and its inspiration, 
and go on to provide an in depth analysis the dynamics of chemotaxis 
using simplified mathematical models [1]. 
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Chapter 7 

Immune Algorithms 



7.1 Overview 

This chapter describes Immune Algorithms. 
7.1.1 Immune System 

Immune Algorithms belong to the Artificial Immune Systems field of 
study concerned with computational methods inspired by the process 
and mechanisms of the biological immune system. 

A simplified description of the immune system is an organ system 
intended to protect the host organism from the threats posed to it from 
pathogens and toxic substances. Pathogens encompass a range of micro- 
organisms such as bacteria, viruses, parasites and pollen. The traditional 
perspective regarding the role of the immune system is divided into 
two primary tasks: the detection and elimination of pathogen. This 
behavior is typically referred to as the differentiation of self (molecules 
and cells that belong to the host organisms) from potentially harmful 
non-self. More recent perspectives on the role of the system include a 
maintenance system [3] , and a cognitive system [22] . 

The architecture of the immune system is such that a series of 
defensive layers protect the host. Once a pathogen makes it inside the 
host, it must contend with the innate and acquired immune system. 
These interrelated immunological sub-systems are comprised of many 
types of cells and molecules produced by specialized organs and processes 
to address the self-nonself problem at the lowest level using chemical 
bonding, where the surfaces of cells and molecules interact with the 
surfaces of pathogen. 

The adaptive immune system, also referred to as the acquired immune 
system, is named such because it is responsible for specializing a defense 
for the host organism based on the specific pathogen to which it is 
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exposed. Unlike the innate immune system, the acquired immune 
system is present only in vertebrates (animals with a spinal column). 
The system retains a memory of exposures which it has encountered. 
This memory is recalled on reinfection exhibiting a learned pathogen 
identification. This learning process may be divided into two types 
of response. The first or primary response occurs when the system 
encounters a novel pathogen. The system is slow to respond, potentially 
taking a number of weeks to clear the infection. On re-encountering 
the same pathogen again, the system exhibits a secondary response, 
applying what was learned in the primary response and clearing up 
the infection rapidly. The memory the system acquires in the primary 
response is typically long lasting, providing pathogenic immunity for the 
lifetime of the host, two common examples of which are the chickenpox 
and measles. White blood cells called lymphocytes (or leukocytes) are 
the most important cell in the acquired immune system. Lymphocytes 
are involved in both the identification and elimination of pathogen, and 
recirculate within the host organisms body in the blood and lymph (the 
fluid that permeates tissue). 

7.1.2 Artificial Immune Systems 

Artificial Immune Systems (AIS) is a sub-field of Computational Intelli- 
gence motivated by immunology (primarily mammalian immunology) 
that emerged in the early 1990s (for example [1, 15]), based on the 
proposal in the late 1980s to apply theoretical immunological models 
to machine learning and automated problem solving (such as [9, 12]). 
The early works in the field were inspired by exotic theoretical mod- 
els (immune network theory) and were applied to machine learning, 
control and optimization problems. The approaches were reminiscent 
of paradigms such as Artificial Neural Networks, Genetic Algorithms, 
Reinforcement Learning, and Learning Classifier Systems. The most 
formative works in giving the field an identity were those that proposed 
the immune system as an analogy for information protection systems in 
the field of computer security. The classical examples include Forrest 
et al.'s Computer Immunity [10, 11] and Kephart's Immune Anti- Virus 
[17, 18]. These works were formative for the field because they provided 
an intuitive application domain that captivated a broader audience and 
assisted in differentiating the work as an independent sub-field. 

Modern Artificial Immune systems are inspired by one of three 
sub-fields: clonal selection, negative selection and immune network 
algorithms. The techniques are commonly used for clustering, pat- 
tern recognition, classification, optimization, and other similar machine 
learning problem domains. 

The seminal reference for those interested in the field is the text 
book by de Castro and Timmis 11 Artificial Immune Systems: A New 
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Computational Intelligence Approach" [8]. This reference text provides 
an introduction to immunology with a level of detail appropriate for 
a computer scientist, followed by a summary of the state of the art, 
algorithms, application areas, and case studies. 

7.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Artificial Immune Systems, not limited to: 

• Clonal Selection Algorithms: such as the B-Cell Algorithm 
[16], the Multi-objective Immune System Algorithm (MSIRA) 
[2, 4] and the the Optimization Immune Algorithm (opt-IA, opt- 
IMMALG) [5, 6] and the Simple Immunological Algorithm [7]. 

• Immune Network Algorithms: such as the approach by Tim- 
mis used for clustering called the Artificial Immune Network (AIN) 
[20] (later extended and renamed the Resource Limited Artificial 
Immune System [19, 21]. 

• Negative Selection Algorithms: such as an adaptive frame- 
work called the ARTificial Immune System (ARTIS), with the 
application to intrusion detection renamed the Lightweight Intru- 
sion Detection System (LISYS) [13, 14]. 
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7.2 Clonal Selection Algorithm 

Clonal Selection Algorithm, CSA, CLONALG. 

7.2.1 Taxonomy 

The Clonal Selection Algorithm (CLONALG) belongs to the field of 
Artificial Immune Systems. It is related to other Clonal Selection Algo- 
rithms such as the Artificial Immune Recognition System (Section 7.4), 
the B-Cell Algorithm (BCA), and the Multi-objective Immune System 
Algorithm (MIS A). There are numerious extensions to CLONALG in- 
cluding tweaks such as the CLONALG1 and CLONALG2 approaches, a 
version for classification called CLONCLAS, and an adaptive version 
called Adaptive Clonal Selection (ACS). 

7.2.2 Inspiration 

The Clonal Selection algorithm is inspired by the Clonal Selection 
theory of acquired immunity. The clonal selection theory credited 
to Burnet was proposed to account for the behavior and capabilities 
of antibodies in the acquired immune system [2, 3]. Inspired itself 
by the principles of Darwinian natural selection theory of evolution, 
the theory proposes that antigens select-for lymphocytes (both B and 
T-cells). When a lymphocyte is selected and binds to an antigenic 
determinant, the cell proliferates making many thousands more copies 
of itself and differentiates into different cell types (plasma and memory 
cells). Plasma cells have a short lifespan and produce vast quantities of 
antibody molecules, whereas memory cells live for an extended period 
in the host anticipating future recognition of the same determinant. 
The important feature of the theory is that when a cell is selected 
and proliferates, it is subjected to small copying errors (changes to the 
genome called somatic hypermutation) that change the shape of the 
expressed receptors and subsequent determinant recognition capabilities 
of both the antibodies bound to the lymphocytes cells surface, and the 
antibodies that plasma cells produce. 

7.2.3 Metaphor 

The theory suggests that starting with an initial repertoire of general 
immune cells, the system is able to change itself (the compositions and 
densities of cells and their receptors) in response to experience with the 
environment. Through a blind process of selection and accumulated 
variation on the large scale of many billions of cells, the acquired immune 
system is capable of acquiring the necessary information to protect the 
host organism from the specific pathogenic dangers of the environment. 
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It also suggests that the system must anticipate (guess) at the pathogen 
to which it will be exposed, and requires exposure to pathogen that 
may harm the host before it can acquire the necessary information to 
provide a defense. 

7.2.4 Strategy 

The information processing principles of the clonal selection theory 
describe a general learning strategy. This strategy involves a population 
of adaptive information units (each representing a problem-solution or 
component) subjected to a competitive processes for selection, which 
together with the resultant duplication and variation ultimately improves 
the adaptive fit of the information units to their environment. 

7.2.5 Procedure 

Algorithm 7.2.1 provides a pseudocode listing of the Clonal Selection 
Algorithm (CLONALG) for minimizing a cost function. The general 
CLONALG model involves the selection of antibodies (candidate solu- 
tions) based on affinity either by matching against an antigen pattern or 
via evaluation of a pattern by a cost function. Selected antibodies are 
subjected to cloning proportional to affinity, and the hypermutation of 
clones inversely-proportional to clone affinity. The resultant clonal-set 
competes with the existent antibody population for membership in 
the next generation. In addition, low-affinity population members are 
replaced by randomly generated antibodies. The pattern recognition 
variation of the algorithm includes the maintenance of a memory solution 
set which in its entirety represents a solution to the problem. A binary- 
encoding scheme is employed for the binary-pattern recognition and 
continuous function optimization examples, and an integer permutation 
scheme is employed for the Traveling Salesman Problem (TSP). 

7.2.6 Heuristics 

• The CLONALG was designed as a general machine learning ap- 
proach and has been applied to pattern recognition, function 
optimization, and combinatorial optimization problem domains. 

• Binary string representations are used and decoded to a represen- 
tation suitable for a specific problem domain. 

• The number of clones created for each selected member is calcu- 
lated as a function of the repertoire size N c = round(/3 ■ N), where 
j3 is the user parameter Clone rate . 
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Algorithm 7.2.1: Pseudocode for CLONALG. 

Input: Population S i ze , Selection size, Problem S i Z e, 
RandomCells num , Clone rate , Mutation rate 
Output: Population 

1 Population CreateRandomCells {.Population s i Z e, 
Problem size ); 

2 while ^StopConditionO do 

3 foreach pi G Population do 

4 j Aff inity(pj); 

5 end 

6 Population se i ect <s— Select (Population, Selection S i ze )] 

7 Population clones <- 0; 

8 foreach pi G Population se iect do 

9 ! Population c i ones «- Clone (pj, Clone rate ); 
10 end 

n foreach pi G Population clones do 

12 Hyper mutate (p i; Mutation rate ) ; 

13 Aff inity(pj); 

14 end 

15 Population <— Select (Population, Population c i ones , 
Population S i ze ) ; 

16 Population ran d <— CreateRandomCells (RandomCells num ) ; 

17 Replace (Population, Population ran( i); 
is end 

19 return Population; 



• A rank-based affinity-proportionate function is used to determine 
the number of clones created for selected members of the popula- 
tion for pattern recognition problem instances. 



• The number of random antibodies inserted each iteration is typi- 
cally very low (1-2). 



• Point mutations (bit-flips) are used in the hypermutation opera- 
tion. 



• The function exp(—p ■ /) is used to determine the probability of 
individual component mutation for a given candidate solution, 
where / is the candidates affinity (normalized maximizing cost 
value), and p is the user parameter Mutation ra te- 
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7.2.7 Code Listing 

Listing 7.1 provides an example of the Clonal Selection Algorithm 
(CLONALG) implemented in the Ruby Programming Language. The 
demonstration problem is an instance of a continuous function optimiza- 
tion that seeks min /(a:) where / = J^., xf , —5.0 < x^ < 5.0 and n = 3. 
The optimal solution for this basin function is (vq, . . . , = 0.0. The 

algorithm is implemented as described by de Castro and Von Zuben for 
function optimization [8]. 

def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x**2.0)} 
end 

def decode (bitstring, search_space , bits_per_param) 
vector = [] 

search_space . each_with_index do I bounds, i| 
off, sum = i*bits_per_param, 0.0 

param = bitstring [of f ... (of f +bits_per_param) ]. reverse 
param. size .times do |j| 

sum += ( (param [j ] .chr=='l') ? 1.0 : 0.0) * (2.0 ** j.to_f) 
end 

min, max = bounds 

vector << min + ( (max-min) / ( (2 . 0**bits_per_param. to_f ) -1 . 0) ) * sum 
end 

return vector 
end 

def evaluate (pop, search_space , bits_per_param) 
pop . each do I p I 

p[: vector] = decode (p[: bitstring] , search_space , bits_per_param) 
p[:cost] = objective_f unction(p [: vector] ) 
end 
end 

def random_bitstring(num_bits) 

return (0 . . .num_bits) . inject (" ") { I s , i I s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def point_mutation(bitstring, rate) 
child = "" 

bitstring . size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<rate) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def calculate_mutation_rate (antibody, mutate_f actor=-2 . 5) 

return Math. exp(mutate_f actor * antibody [: affinity] ) 
end 

def num_clones (pop_size , clone_f actor) 

return (pop_size * clone_f actor) . floor 
end 
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def calculate_af f inity (pop) 

pop . sort ! { I x , y I x [ : cost] <=>y [ : cost] } 
range = pop. last [: cost] - pop . first [: cost] 
if range ==0.0 

pop. each {|p| p[:aff inity] = 1.0} 
else 

pop. each {|p| p [: affinity] = 1 . 0-(p [: cost] /range) } 
end 
end 

def clone_and_hypermutate (pop , clone_f actor) 
clones = [] 

num_clones = num_clones (pop . size , clone_f actor) 
calculate_aff inity (pop) 
pop. each do I antibody I 

m_rate = calculate_mutation_rate (antibody) 

num_clones .times do 
clone = {} 

clone [:bitstring] = point_mutation(antibody [:bitstring] , m_rate) 
clones << clone 
end 
end 

return clones 
end 

def random_insertion(search_space, pop, num_rand, bits_per_param) 
return pop if num_rand == 0 
rands = Array .new (num_rand) do |i| 

{ : bit string=>random_bit string (sear ch_space . size*bits_per_param) } 
end 

evaluate (rands , search_space , bits_per_param) 

return (pop+rands) .sort{|x,y| x [: cost] <=>y [: cost] }. first (pop . size) 
end 

def search(search_space , max_gens, pop_size, clone_f actor , num_rand, 
bits_per_param=16) 
pop = Array .new(pop_size) do |i| 

{ : bit string=>random_bit string (sear ch_space . size*bits_per_param) } 
end 

evaluate (pop , search_space , bits_per_param) 
best = pop . min{ I x , y I x [: cost] <=>y [: cost] } 
max_gens .times do I gen I 

clones = clone_and_hypermutate(pop, clone_f actor) 

evaluate(clones, search_space , bits_per_param) 

pop = (pop+clones) . sort{ I x,y I x [: cost] <=>y [: cost] }. first (pop_size) 
pop = random_insertion(search_space, pop, num_rand, bits_per_param) 
best = (pop + [best] ) .min{ I x,y I x [: cost] <=>y [: cost] } 
puts " > gen #{gen+l}, f =#{best [ : cost] }, s=#{best [: vector] . inspect}" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 
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search_space = Array .new (problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_gens = 100 
pop_size = 100 

clone_f actor = 0.1 
num_rand = 2 

# execute the algorithm 

best = search(search_space, max_gens, pop_size, clone_f actor , 
num_rand) 

puts "done! Solution: f =#{best [ : cost] } , s=#{best [: vector] . inspect}" 
end 



Listing 7.1: CLONALG in Ruby 



7.2.8 References 
Primary Sources 

Hidden at the back of a technical report on the applications of Artificial 
Immune Systems de Castro and Von Zuben [6] proposed the Clonal 
Selection Algorithm (CSA) as a computational realization of the clonal 
selection principle for pattern matching and optimization. The algorithm 
was later published [7], and investigated where it was renamed to 
CLONALG (CLONal selection ALGorithm) [8]. 



Learn More 

Watkins et al. proposed to exploit the inherent distributedness of the 
CLONALG and proposed a parallel version of the pattern recognition 
version of the algorithm [10]. White and Garret also investigated the 
pattern recognition version of CLONALG and generalized the approach 
for the task of binary pattern classification renaming it to Clonal Classi- 
fication (CLONCLAS) where their approach was compared to a number 
of simple Hamming distance based heuristics [11]. In an attempt to 
address concerns of algorithm efficiency, parameterization, and represen- 
tation selection for continuous function optimization Garrett proposed 
an updated version of CLONALG called Adaptive Clonal Selection 
(ACS) [9]. In their book, de Castro and Timmis provide a detailed treat- 
ment of CLONALG including a description of the approach (starting 
page 79) and a step through of the algorithm (starting page 99) [5]. 
Cutello and Nicosia provide a study of the clonal selection principle and 
algorithms inspired by the theory [4]. Brownlee provides a review of 
Clonal Selection algorithms providing a taxonomy, algorithm reviews, 
and a broader bibliography [1]. 
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7.3 Negative Selection Algorithm 

Negative Selection Algorithm, NSA. 



7.3.1 Taxonomy 

The Negative Selection Algorithm belongs to the field of Artificial 
Immune Systems. The algorithm is related to other Artificial Immune 
Systems such as the Clonal Selection Algorithm (Section 7.2), and the 
Immune Network Algorithm (Section 7.5). 



7.3.2 Inspiration 

The Negative Selection algorithm is inspired by the self-nonself discrim- 
ination behavior observed in the mammalian acquired immune system. 
The clonal selection theory of acquired immunity accounts for the adap- 
tive behavior of the immune system including the ongoing selection and 
proliferation of cells that select-for potentially harmful (and typically 
foreign) material in the body. An interesting aspect of this process is 
that it is responsible for managing a population of immune cells that 
do not select-for the tissues of the body, specifically it does not create 
self-reactive immune cells known as auto-immunity. This problem is 
known as 'self-nonself discrimination' and it involves the preparation 
and on going maintenance of a repertoire of immune cells such that 
none are auto-immune. This is achieved by a negative selection process 
that selects-for and removes those cells that are self-reactive during 
cell creation and cell proliferation. This process has been observed in 
the preparation of T-lymphocytes, naive versions of which are matured 
using both a positive and negative selection process in the thymus. 



7.3.3 Metaphor 

The self-nonself discrimination principle suggests that the anticipatory 
guesses made in clonal selection are filtered by regions of infeasibility 
(protein conformations that bind to self-tissues). Further, the self-nonself 
immunological paradigm proposes the modeling of the unknown domain 
(encountered pathogen) by modeling the complement of what is known. 
This is unintuitive as the natural inclination is to categorize unknown 
information by what is different from that which is known, rather than 
guessing at the unknown information and filtering those guesses by what 
is known. 
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7.3.4 Strategy 

The information processing principles of the self-nonself discrimination 
process via negative selection are that of a anomaly and change detection 
systems that model the anticipation of variation from what is known. 
The principle is achieved by building a model of changes, anomalies, 
or unknown (non-normal or non-self) data by generating patterns that 
do not match an existing corpus of available (self or normal) patterns. 
The prepared non-normal model is then used to either monitor the 
existing normal data or streams of new data by seeking matches to the 
non-normal patterns. 



7.3.5 Procedure 

Algorithm 7.3.1 provides a pseudocode listing of the detector genera- 
tion procedure for the Negative Selection Algorithm. Algorithm 7.3.2 
provides a pseudocode listing of the detector application procedure for 
the Negative Selection Algorithm. 



Algorithm 7.3.1: Pseudocode for detector generation. 
Input: SelfData 
Output: Repertoire 

1 Repertoire <— 0; 

2 while ^StopConditionO do 

3 Detectors 4— GenerateRandomDetectors () ; 

4 foreach Detectori € Repertoire do 

5 if ^Matches {Detectori, SelfData) then 

6 Repertoire 4— Detectori; 

7 end 

8 end 

9 end 

10 return Repertoire; 



7.3.6 Heuristics 

• The Negative Selection Algorithm was designed for change detec- 
tion, novelty detection, intrusion detection and similar pattern 
recognition and two-class classification problem domains. 

• Traditional negative selection algorithms used binary representa- 
tions and binary matching rules such as Hamming distance, and 
r-contiguous bits. 
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Algorithm 7.3.2: Pseudocode for detector application. 

Input: InputSamples, Repertoire 
l for Inputi € InputSamples do 

Inputi c i ass 4— "non-self" ; 
foreach Detector j € Repertoire do 

if Matches (Inputi, Detector i) then 
Inputi c i ass 4- "self"; 
Break; 
end 
end 
9 end 



• A data representation should be selected that is most suitable for 
a given problem domain, and a matching rule is in turn selected 
or tailored to the data representation. 

• Detectors can be prepared with no prior knowledge of the problem 
domain other than the known (normal or self) dataset. 

• The algorithm can be configured to balance between detector 
convergence (quality of the matches) and the space complexity 
(number of detectors). 

• The lack of dependence between detectors means that detector 
preparation and application is inherently parallel and suited for a 
distributed and parallel implementation, respectively. 

7.3.7 Code Listing 

Listing 7.2 provides an example of the Negative Selection Algorithm 
implemented in the Ruby Programming Language. The demonstration 
problem is a two-class classification problem where samples are drawn 
from a two-dimensional domain, where X4 £ [0, 1]. Those samples in 
1.0 > Xi > 0.5 are classified as self and the rest of the space belongs to 
the non-self class. Samples are drawn from the self class and presented 
to the algorithm for the preparation of pattern detectors for classifying 
unobserved samples from the non-self class. The algorithm creates a set 
of detectors that do not match the self data, and are then applied to a 
set of randomly generated samples from the domain. The algorithm uses 
a real-valued representation. The Euclidean distance function is used 
during matching and a minimum distance value is specified as a user 
parameter for approximate matches between patterns. The algorithm 
includes the additional computationally expensive check for duplicates 
in the preparation of the self dataset and the detector set. 
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def random_vector (miiimax) 

return Array .new (minmax . length) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randQ) 

end 
end 

def euclidean_distance (cl , c2) 
sum = 0.0 

cl.each_index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Math. sqrt (sum) 
end 

def contains? (vector , space) 
vector .each_with_index do |v,i| 

return false if v<space [i] [0] or v>space [i] [1] 
end 

return true 
end 

def matches?(vector, dataset, min_dist) 
dataset . each do I pattern I 

dist = euclidean_distance (vector, pattern [: vector] ) 

return true if dist <= min_dist 
end 

return false 
end 

def generate_detectors(max_detectors, search_space , self _dataset , 
min_dist) 
detectors = [] 
begin 

detector = { : vector=>random_vector (search_space) } 
if ! matches? (detector [: vector] , self _dataset , min_dist) 
detectors « detector if ! matches? (detector [:vector] , detectors, 
0.0) 

end 

end while detectors . size < max_detectors 
return detectors 
end 

def generate_self _dataset (num_records , self_space, search_space) 
self_dataset = [] 
begin 

pattern = -Q 

pattern [: vector] = random_vector (search_space) 

next if matches? (pattern [: vector] , self _dataset , 0.0) 

if contains? (pattern [: vector] , self_space) 

self_dataset << pattern 
end 

end while self _dataset . length < num_records 
return self_dataset 
end 

def apply .detectors (detectors, bounds, self _dataset , min_dist, 
trials=50) 
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correct = 0 
trials. times do |i| 

input = {: vector=>random_vector (bounds)} 

actual = matches? (input [: vector] , detectors, min_dist) ? "N" : "S" 
expected = matches? (input [: vector] , self _dataset , min_dist) ? "S" : 
"N" 

correct += 1 if actual==expected 

puts "#{i+l}/#{trials} : predicted=#{actual} , expected=#{expected}" 
end 

puts "Done. Result: #{correct}/#{trials}" 
return correct 
end 

def execute (bounds , self_space, max_detect, max_self, min_dist) 
self_dataset = generate_self _dataset (max_self , self_space, bounds) 
puts "Done: prepared #{self _dataset . size} self patterns." 
detectors = generate_detectors (max_detect , bounds, self _dataset , 
min_dist) 

puts "Done: prepared #{detectors . size} detectors." 
apply_detectors (detectors , bounds, self _dataset , min_dist) 
return detectors 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array. new(problem_size) {[0.0, 1.0]} 
self_space = Array .new (problem_size) {[0.5, 1.0]} 
max_self = 150 

# algorithm configuration 
max_detectors = 300 
min_dist = 0.05 

# execute the algorithm 

execute (search_space , self_space, max_detectors , max_self, min_dist) 
end 



Listing 7.2: Negative Selection Algorithm in Ruby 



7.3.8 References 
Primary Sources 

The seminal negative selection algorithm was proposed by Forrest, et al. 
[5] in which a population of detectors are prepared in the presence of 
known information, where those randomly generated detectors that 
match against known data are discarded. The population of pattern 
guesses in the unknown space then monitors the corpus of known infor- 
mation for changes. The algorithm was applied to the monitoring of 
files for changes (corruptions and infections by computer viruses), and 
later formalized as a change detection algorithm [2, 3]. 
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Learn More 

The Negative Selection algorithm has been applied to the monitoring 
of changes in the execution behavior of Unix processes [4, 8], and to 
monitor changes in remote connections of a network computer (intrusion 
detection) [6, 7]. The application of the algorithm has been predomi- 
nantly to virus host intrusion detection and their abstracted problems of 
classification (two-class) and anomaly detection. Esponda provides some 
interesting work showing some compression and privacy benefits provided 
by maintaining a negative model (non-self) [1] Ji and Dasgupta provide 
a contemporary and detailed review of Negative Selection Algorithms 
covering topics such as data representations, matching rules, detector 
generation procedures, computational complexity, hybridization, and 
theoretical frameworks [9]. Recently, the validity of the application 
of negative selection algorithms in high-dimensional spaces has been 
questioned, specifically given the scalability of the approach in the face 
of the exponential increase in volume within the problem space [10]. 
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7.4 Artificial Immune Recognition System 

Artificial Immune Recognition System, AIRS. 

7.4.1 Taxonomy 

The Artificial Immune Recognition System belongs to the field of Artifi- 
cial Immune Systems, and more broadly to the field of Computational 
Intelligence. It was extended early to the canonical version called the Ar- 
tificial Immune Recognition System 2 (AIRS2) and provides the basis for 
extensions such as the Parallel Artificial Immune Recognition System [8] . 
It is related to other Artificial Immune System algorithms such as the 
Dendritic Cell Algorithm (Section 7.6), the Clonal Selection Algorithm 
(Section 7.2), and the Negative Selection Algorithm (Section 7.3). 

7.4.2 Inspiration 

The Artificial Immune Recognition System is inspired by the Clonal 
Selection theory of acquired immunity. The clonal selection theory 
credited to Burnet was proposed to account for the behavior and ca- 
pabilities of antibodies in the acquired immune system [1, 2]. Inspired 
itself by the principles of Darwinian natural selection theory of evolution, 
the theory proposes that antigens select-for lymphocytes (both B and 
T-cells). When a lymphocyte is selected and binds to an antigenic 
determinant, the cell proliferates making many thousands more copies 
of itself and differentiates into different cell types (plasma and memory 
cells). Plasma cells have a short lifespan and produce vast quantities of 
antibody molecules, whereas memory cells live for an extended period in 
the host anticipating future recognition of the same determinant. The 
important feature of the theory is that when a cell is selected and pro- 
liferates, it is subjected to small copying errors (changes to the genome 
called somatic hypermutation) that change the shape of the expressed 
receptors. It also affects the subsequent determinant recognition capa- 
bilities of both the antibodies bound to the lymphocytes cells surface, 
and the antibodies that plasma cells produce. 

7.4.3 Metaphor 

The theory suggests that starting with an initial repertoire of general 
immune cells, the system is able to change itself (the compositions and 
densities of cells and their receptors) in response to experience with the 
environment. Through a blind process of selection and accumulated 
variation on the large scale of many billions of cells, the acquired immune 
system is capable of acquiring the necessary information to protect the 
host organism from the specific pathogenic dangers of the environment. 
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It also suggests that the system must anticipate (guess) at the pathogen 
to which it will be exposed, and requires exposure to pathogen that 
may harm the host before it can acquire the necessary information to 
provide a defense. 

7.4.4 Strategy 

The information processing objective of the technique is to prepare a 
set of real-valued vectors to classify patterns. The Artificial Immune 
Recognition System maintains a pool of memory cells that are prepared 
by exposing the system to a single iteration of the training data. Candi- 
date memory cells are prepared when the memory cells are insufficiently 
stimulated for a given input pattern. A process of cloning and mutation 
of cells occurs for the most stimulated memory cell. The clones compete 
with each other for entry into the memory pool based on stimulation and 
on the amount of resources each cell is using. This concept of resources 
comes from prior work on Artificial Immune Networks, where a single 
cell (an Artificial Recognition Ball or ARB) represents a set of similar 
cells. Here, a cell's resources are a function of its stimulation to a given 
input pattern and the number of clones it may create. 

7.4.5 Procedure 

Algorithm 8.6.1 provides a high-level pseudocode for preparing memory 
cell vectors using the Artificial Immune Recognition System, specifically 
the canonical AIRS2. An affinity (distance) measure between input 
patterns must be defined. For real-valued vectors, this is commonly the 
Euclidean distance: 



where n is the number of attributes, x is the input vector and c 
is a given cell vector. The variation of cells during cloning (somatic 
hypermutation) occurs inversely proportional to the stimulation of a 
given cell to an input pattern. 

7.4.6 Heuristics 

• The AIRS was designed as a supervised algorithm for classification 
problem domains. 

• The AIRS is non-parametric, meaning that it does not rely on 
assumptions about that structure of the function that is is approx- 




(7.1) 



imating. 
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Algorithm 7.4.1: Pseudocode for AIRS2. 



Input: InputPatterns, clone ra te, rnutate ra t e , stimthresh, 

resources max , affinity th 
Output. Cells rnemor y 



l Cells 



memory 



<— InitializeMemoryPool(lnputPatterns) ; 



2 foreach InputP atterni £ InputPatterns do 
Stimulate (Cells memory , InputPatterns) ; 
Cellbest 4— GetHostStlmulateddnputPatterni, 



C ell S m emor 

if Cellar 



Cell 



,); 

7^ InputPatternf ass then 

nory 4— CreateNewMemoryCell (Input Patter ; 



else 



Cellii™ 1 x clone rate x mutate ra te] 



Clones num -s— i^Qot feest 
Cells c lones ^ Cellbest ^ 

for z to Clonesnum do 

Cellsdones ^— CloneAndMutate (Cellbest) ; 
end 

while AverageStimulation(Ce^s c ; ones ) < stimthresh 
do 

foreach Celli £ Cells c i ones do 

Cells c i 0 nes ^- CloneAndMutate (Celli) ; 
end 

Stimulate (Cells c i ones , InputPatterns) ; 
ReducePoolToMaximumResources (Cells c i ones , 

resources m ax) ; 
end 

Ceii c GetMostStimulated (InputPatterrii, 

Cells dones) ? 

if CeZZf 4m > CeZZg**™ then 



CeZL< 



«- Ce// C ; 



if Affinity (Cell c , Cell best ) < af finity 'thresh then 



DeleteCelKCe// 



best 



end 
end 
end 

28 end 

29 return CeZZs. 



'memory J i 



memory i 



• Real- values in input vectors should be normalized such that x £ 
[0,1). 

• Euclidean distance is commonly used to measure the distance 
between real- valued vectors (affinity calculation), although other 
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distance measures may be used (such as dot product), and data spe- 
cific distance measures may be required for non-scalar attributes. 

• Cells may be initialized with small random values or more com- 
monly with values from instances in the training set. 

• A cell's affinity is typically minimizing, where as a cells stimulation 
is maximizing and typically € [0, 1]. 

7.4.7 Code Listing 

Listing 7.3 provides an example of the Artificial Immune Recognition Sys- 
tem implemented in the Ruby Programming Language. The problem is a 
contrived classification problem in a 2-dimensional domain x G [0, 1], y € 
[0, 1] with two classes: 'A' (a; € [0, 0.4999999], y € [0, 0.4999999]) and 'B' 
(xG [0.5,1], y£ [0.5,1]). 

The algorithm is an implementation of the AIRS2 algorithm [7] . An 
initial pool of memory cells is created, one cell for each class. Euclidean 
distance divided by the maximum possible distance in the domain is 
taken as the affinity and stimulation is taken as 1.0 — af finity. The 
meta-dynamics for memory cells (competition for input patterns) is not 
performed and may be added into the implementation as an extension. 

def random_vector (minmax) 

return Array . new (minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randQ) 

end 
end 

def gener at e_random_pattern (domain) 

class_label = domain. keys [rand (domain. keys . size)] 
pattern = { : label=>class_label} 

pattern [: vector] = random_vector (domain [class_label] ) 
return pattern 
end 

def create_cell (vector, class_label) 

return { : label=>class_label , : vector=>vector} 
end 

def initialize_cells (domain) 
mem_cells = [] 
domain. keys. each do I key I 

mem_cells << create_cell (random_vector ( [ [0 , 1] , [0, 1] ] ) , key) 
end 

return mem_cells 
end 

def distance(cl, c2) 
sum = 0.0 

cl .each_index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Math . sqrt ( sum) 
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end 

def stimulate (cells , pattern) 

max_dlst = distance ( [0 . 0 ,0 . 0] , [1.0,1.0]) 
cells. each do I cell I 

cell [: affinity] = distance (cell [: vector] , pattern [: vector] ) / 
max_dist 

cell [: stimulation] = 1.0 - cell [: affinity] 
end 
end 

def get_most_stimulated_cell (mem_cells , pattern) 
stimulate (mem_cells , pattern) 

return mem_cells . sort{ I x ,y I y [: stimulation] <=> x [: stimulation] } .first 
end 

def mutate_cell (cell , best_match) 

range = 1.0 - best_match [: stimulation] 
cell [: vector] . each_with_index do |v,i| 

min = [ (v- (range/2 . 0) ) , 0.0] .max 

max = [ (v+ (range/2 . 0) ) , 1.0] .min 

cell [: vector] [i] = min + (randO * (max-min)) 
end 

return cell 
end 

def create_arb_pool (pattern, best_match, clone_rate, mutate_rate) 
pool = [] 

pool << create_cell (best_match[: vector] , best_match [: label] ) 
num_clones = (best_match [: stimulation] * clone_rate * 

mutate_rate) .round 
num_clones . times do 

cell = create_cell (best_match[: vector] , best_match [: label] ) 

pool << mutate_cell(cell, best_match) 
end 

return pool 
end 

def competition_f or_resournces (pool , clone_rate, max_res) 

pool. each { I cell I cell [: resources] = cell [: stimulation] * clone_rate} 
pool . sort ! { I x,y I x [: resources] <=> y [: resources] } 

total_resources = pool . inject (0 . 0){ I sum, cell I sum + cell [: resources] } 
while total_resources > max_res 
cell = pool . delete_at (pool . size-1) 
total_resources -= cell [: resources] 
end 
end 

def ref ine_arb_pool(pool, pattern, stim_thresh, clone_rate, max_res) 
mean_stim, candidate = 0.0, nil 
begin 

stimulate (pool , pattern) 

candidate = pool . sort{ I x ,y I y [: stimulation] <=> 

x [ : stimulation] } .first 
mean_stim = pool . inject (0 . 0){ I s , c I s + c [: stimulation] } / pool. size 
if mean_stim < stim_thresh 
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candidate = competition_f or_resournces (pool , clone_rate, max_res) 
pool . size . times do |i| 

cell = create_cell (pool [i] [: vector] , pool [i] [: label] ) 

mutate_cell(cell, pool[i]) 

pool « cell 
end 
end 

end until mean_stim >= stim_thresh 
return candidate 
end 

def add_candidate_to_memory_pool (candidate, best_match, mem_cells) 
if candidate [: stimulation] > best_match[ : stimulation] 

mem_cells « candidate 
end 

end 

def classify_pattern(mem_cells, pattern) 
stimulate (mem_cells, pattern) 

return mem_cells . sort{ I x,y I y [: stimulation] <=> x [: stimulation] }. first 
end 

def train_system(mem_cells , domain, num_patterns , clone_rate, 
mutate_rate, stim_thresh, max_res) 
num_patterns .times do |i| 

pattern = generate_random_pattern (domain) 

best_match = get_most_stimulated_cell (mem_cells , pattern) 

if best_match[: label] != pattern [: label] 

mem_cells « create_cell (pattern [: vector] , pattern [: label] ) 
elsif best_match [: stimulation] < 1.0 
pool = create_arb_pool (pattern, best_match, clone_rate, 
mutate_rate) 

cand = ref ine_arb_pool (pool, pattern, stim_thresh, clone_rate, 
max_res) 

add_candidate_to_memory_pool(cand, best_match, mem_cells) 
end 

puts " > iter=#{i+l}, mem_cells=#{mem_cells . size}" 
end 
end 

def test_system(mem_cells , domain, num_trials=50) 
correct = 0 
num_trials . times do 

pattern = generate_random_pattern (domain) 

best = classif y_pattern(mem_cells , pattern) 

correct += 1 if best [: label] == pattern [: label] 
end 

puts "Finished test with a score of #{correct}/#{num_trials}" 
return correct 
end 

def execute (domain, num_patterns , clone_rate, mutate_rate, stim_thresh, 
max_res) 

mem_cells = initialize_cells (domain) 

train_system(mem_cells , domain, num_patterns , clone_rate, 
mutate_rate, stim_thresh, max_res) 
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test_system(mem_cells, domain) 
return mem_cells 
end 

if __FILE__ == $0 

# problem configuration 

domain = {"A"=> [ [0 ,0 . 4999999] , [0,0.4999999]] , "B"=> [ [0 . 5 , 1] , [0.5,1]]} 
num_patterns = 50 

# algorithm configuration 
clone_rate = 10 
mutate_rate = 2.0 
stim_thresh = 0.9 
max_res = 150 

# execute the algorithm 

execute (domain, num_patterns , clone_rate, mutate_rate, stim_thresh, 
max_res) 

end 



Listing 7.3: AIRS in Ruby 



7.4.8 References 
Primary Sources 

The Artificial Immune Recognition System was proposed in the Masters 
work by Watkins [10], and later published [11]. Early works included the 
application of the AIRS by Watkins and Boggess to a suite of benchmark 
classification problems [6] , and a similar study by Goodman and Boggess 
comparing to a conceptually similar approach called Learning Vector 
Quantization [3]. 



Learn More 

Marwah and Boggess investigated the algorithm seeking issues that 
affect the algorithms performance [5] . They compared various variations 
of the algorithm with modified resource allocation schemes, tie-handling 
within the ARB pool, and ARB pool organization. Watkins and Timmis 
proposed a new version of the algorithm called AIRS2 which became 
the replacement for AIRS1 [7]. The updates reduced the complexity 
of the approach while maintaining the accuracy of the results. An 
investigation by Goodman et al. into the so called 1 source of power' in 
AIRS indicated that perhaps the memory cell maintenance procedures 
played an important role [4]. Watkins et al. provide a detailed review of 
the technique and its application [9]. 
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7.5 Immune Network Algorithm 

Artificial Immune Network, aiNet, Optimization Artificial Immune Net- 
work, opt-aiNet. 

7.5.1 Taxonomy 

The Artificial Immune Network algorithm (aiNet) is a Immune Network 
Algorithm from the field of Artificial Immune Systems. It is related to 
other Artificial Immune System algorithms such as the Clonal Selection 
Algorithm (Section 7.2), the Negative Selection Algorithm (Section 7.3), 
and the Dendritic Cell Algorithm (Section 7.6). The Artificial Immune 
Network algorithm includes the base version and the extension for opti- 
mization problems called the Optimization Artificial Immune Network 
algorithm (opt-aiNet). 

7.5.2 Inspiration 

The Artificial Immune Network algorithm is inspired by the Immune 
Network theory of the acquired immune system. The clonal selection 
theory of acquired immunity accounts for the adaptive behavior of the 
immune system including the ongoing selection and proliferation of cells 
that select-for potentially harmful (and typically foreign) material in 
the body. A concern of the clonal selection theory is that it presumes 
that the repertoire of reactive cells remains idle when there are no 
pathogen to which to respond. Jerne proposed an Immune Network 
Theory (Idiotypic Networks) where immune cells are not at rest in the 
absence of pathogen, instead antibody and immune cells recognize and 
respond to each other [6-8]. 

The Immune Network theory proposes that antibody (both free 
floating and surface bound) possess idiotopes (surface features) to which 
the receptors of other antibody can bind. As a result of receptor interac- 
tions, the repertoire becomes dynamic, where receptors continually both 
inhibit and excite each other in complex regulatory networks (chains 
of receptors). The theory suggests that the clonal selection process 
may be triggered by the idiotopes of other immune cells and molecules 
in addition to the surface characteristics of pathogen, and that the 
maturation process applies both to the receptors themselves and the 
idiotopes which they expose. 

7.5.3 Metaphor 

The immune network theory has interesting resource maintenance and 
signaling information processing properties. The classical clonal selection 
and negative selection paradigms integrate the accumulative and filtered 
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learning of the acquired immune system, whereas the immune network 
theory proposes an additional order of complexity between the cells and 
molecules under selection. In addition to cells that interact directly 
with pathogen, there are cells that interact with those reactive cells 
and with pathogen indirectly, in successive layers such that networks of 
activity for higher-order structures such as internal images of pathogen 
(promotion), and regulatory networks (so-called anti-idiotopes and anti- 
anti-idiotopes) . 

7.5.4 Strategy 

The objective of the immune network process is to prepare a repertoire 
of discrete pattern detectors for a given problem domain, where better 
performing cells suppress low-affinity (similar) cells in the network. This 
principle is achieved through an interactive process of exposing the 
population to external information to which it responds with both a 
clonal selection response and internal meta-dynamics of intra-population 
responses that stabilizes the responses of the population to the external 
stimuli. 

7.5.5 Procedure 

Algorithm 7.5.1 provides a pseudocode listing of the Optimization Ar- 
tificial Immune Network algorithm (opt-aiNet) for minimizing a cost 
function. 

7.5.6 Heuristics 

• aiNet is designed for unsupervised clustering, where as the opt- 
aiNet extension was designed for pattern recognition and optimiza- 
tion, specifically multi-modal function optimization. 

• The amount of mutation of clones is proportionate to the affinity 
of the parent cell with the cost function (better fitness, lower 
mutation). 

• The addition of random cells each iteration adds a random-restart 
like capability to the algorithms. 

• Suppression based on cell similarity provides a mechanism for 
reducing redundancy. 

• The population size is dynamic, and if it continues to grow it may 
be an indication of a problem with many local optima or that the 
affinity threshold may needs to be increased. 
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Algorithm 7.5.1: Pseudocode for opt-aiNet. 



N, 



random : 



Input: Population 8 i zei ProblemSize, N c i ones 

AffinityThreshold 
Output: S best 

1 Population InitializePopulation(Popu/aizon s i ze , 
ProblemSize); 

2 while ^StopConditionO do 
EvaluatePopulation(Population) ; 
Sbest GetBestSolution(Population) ; 
Progeny <— 0; 



Cost a 



CalculateAveragePopulationCost (Population) ; 



while CalculateAveragePopulationCost (Population) > 

Cost avg do 

foreach Celli € Population do 

Clones <— CreateClones (Celli, N c i ones ); 
foreach Clonei € Clones do 
Clonei -s— 

MutateRelativeToFitnessDf Parent {Clonei, 
Celk); 
end 

EvaluatePopulation(Clones) ; 
Progeny <— GetBestSolution(Clones) ; 
end 
end 

SupressLowAf f inityCells (Progeny, AffinityThreshold) ; 
Progeny <— CreateRandomCells (N ranc [ om ) ; 
Population <- Progeny; 

20 end 

21 return S best ; 



• Affinity proportionate mutation is performed using c' = c + a x 
iV(l, 0) where a = -s x exp(-f), N is a Guassian random number, 
and / is the fitness of the parent cell, j3 controls the decay of the 
function and can be set to 100. 

• The affinity threshold is problem and representation specific, for 
example a Af finityThreshold may be set to an arbitrary value 
such as 0.1 on a continuous function domain, or calculated as a 
percentage of the size of the problem space. 

• The number of random cells inserted may be 40% of the population 
size. 

• The number of clones created for a cell may be small, such as 10. 
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7.5.7 Code Listing 

Listing 7.4 provides an example of the Optimization Artificial Im- 
mune Network (opt-aiNct) implemented in the Ruby Programming 
Language. The demonstration problem is an instance of a continu- 
ous function optimization that seeks min/(a;) where / = X)"=i :r ?! 
—5.0 < Xj < 5.0 and n = 2. The optimal solution for this basin function 
is (vq, ■ • • , v n -i) = 0.0. The algorithm is an implementation based on 
the specification by de Castro and Von Zuben [1] . 



def objective_f unction(vector) 

return vector . inject (0 . 0) {|sum, x| sum + (x**2.0)} 
end 

def random_vector (minmax) 

return Array .new (minmax . size) do |i| 

minmax[i][0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def random_gaussian(mean=0 . 0 , stdev=1.0) 
ul = u2 = w = 0 
begin 

ul = 2 * randO - 1 

u2 = 2 * randO - 1 

w = ul * ul + u2 * u2 
end while w >= 1 

w = Math. sqrt( (-2.0 * Math.log(w)) / w) 
return mean + (u2 * w) * stdev 
end 

def clone (parent) 

v = Array .new(parent [: vector] . size) {|i| parent [: vector] [i] } 

return {:vector=>v} 
end 

def mutation_rate(beta, normalized_cost) 

return (1.0/beta) * Math.exp(-normalized_cost) 
end 

def mutate (beta, child, normalized_cost) 
child [: vector] . each_with_index do |v, i| 

alpha = mutation_rate (beta, normalized_cost) 
child [: vector] [i] = v + alpha * random_gaussian() 
end 
end 

def clone_cell(beta, num_clones, parent) 

clones = Array. new (num_clones) {clone(parent)} 

clones. each {Iclonel mutate(beta, clone, parent [:norm_cost] )} 

clones . each{ I c I c[:cost] = objective_f unction(c [: vector] ) } 

clones . sort ! { I x,y I x[:cost] <=> y[:cost]} 

return clones. first 

end 
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def calculate_normalized_cost (pop) 
pop . sort ! { I x , y I x [ : cost] <=>y [ : cost] } 
range = pop. last [: cost] - pop . first [: cost] 
if range == 0.0 

pop. each {|p| p[:norm_cost] = 1.0} 
else 

pop. each {|p| p[:norm_cost] = 1 . 0- (p [: cost] /range)} 
end 
end 

def average_cost (pop) 

sum = pop. inject (0.0) { I sum, x I sum + x[:cost]} 

return sum / pop . size .to_f 
end 

def distance(cl, c2) 
sum = 0.0 

cl.each_index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Math . sqrt ( sum) 
end 

def get_neighborhood(cell , pop, aff_thresh) 
neighbors = [] 
pop. each do |p| 

neighbors « p if distance (p [: vector] , cell [: vector] ) < aff_thresh 
end 

return neighbors 
end 

def aff inity_supress (population, aff_thresh) 
pop = [] 

population. each do I cell I 

neighbors = get_neighborhood(cell , population, aff_thresh) 
neighbors . sort !{ I x,y I x[:cost] <=> y[:cost]} 

pop << cell if neighbors . empty? or cell . equal? (neighbors . first) 
end 

return pop 
end 

def search(search_space , max_gens, pop_size, num_clones , beta, 
num_rand, aff_thresh) 
pop = Array .new(pop_size) {|i| { : vector=>random_vector (search_space)} 
} 

pop . each{ I c I c[:cost] = objective_f unction(c [: vector] )} 
best = nil 

max_gens .times do I gen I 

pop.each{|c| c[:cost] = objective_f unction(c [: vector] ) } 

calculate_normalized_cost (pop) 

pop. sort ! { I x,y I x[:cost] <=> y[:cost]} 

best = pop. first if best. nil? or pop . first [: cost] < best [: cost] 

avgCost, progeny = average_cost (pop) , nil 

begin 

progeny=Array .new (pop. size){ I i I clone_cell(beta, num_clones, 
pop[i] )} 

end until average_cost (progeny) < avgCost 
pop = aff inity_supress (progeny, aff_thresh) 
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num_rand. times {pop << -(:vector=>randoin_vector(search_space)}} 
puts " > gen #{gen+l}, popSize=#{pop . size} , f itness=#{best [ : cost] }" 
end 

return best 
end 

if __FILE__ == $0 

# problem configuration 
problem_size = 2 

search_space = Array .new (problem_size) {|i| [-5, +5]} 

# algorithm configuration 
max_gens = 150 
pop_size = 20 
num_clones = 10 

beta = 100 
num_rand = 2 

aff_thresh = (search_space [0] [1] -search_space [0] [0] ) *0 . 05 

# execute the algorithm 

best = search(search_space, max_gens, pop_size, num_clones, beta, 

num_rand, aff_thresh) 
puts "done! Solution: f =#{best [ : cost] } , s=#{best [: vector] . inspect}" 
end 



Listing 7.4: Optimization Artificial Immune Network in Ruby 



7.5.8 References 
Primary Sources 

Early works, such as Farmer et al. [5] suggested at the exploitation of the 
information processing properties of network theory for machine learning. 
A seminal network theory based algorithm was proposed by Timmis et al. 
for clustering problems called the Artificial Immune Network (AIN) [11] 
that was later extended and renamed the Resource Limited Artificial 
Immune System [12] and Artificial Immune Network (AINE) [9]. The 
Artificial Immune Network (aiNet) algorithm was proposed by de Castro 
and Von Zuben that extended the principles of the Artificial Immune 
Network (AIN) and the Clonal Selection Algorithm (CLONALG) and 
was applied to clustering [2]. The aiNet algorithm was further extended 
to optimization domains and renamed opt-aiNet [1]. 

Learn More 

The authors de Castro and Von Zuben provide a detailed presentation 
of the aiNet algorithm as a book chapter that includes immunological 
theory, a description of the algorithm, and demonstration application to 
clustering problem instances [3] . Timmis and Edmonds provide a careful 
examination of the opt-aiNet algorithm and propose some modifications 
and augmentations to improve its applicability and performance for 
multimodal function optimization problem domains [10]. The authors 
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de Franca, Von Zuben, and de Castro proposed an extension to opt-aiNet 
that provided a number of enhancements and adapted its capability for 
for dynamic function optimization problems called dopt-aiNet [4]. 
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7.6 Dendritic Cell Algorithm 

Dendritic Cell Algorithm, DC A. 

7.6.1 Taxonomy 

The Dendritic Cell Algorithm belongs to the field of Artificial Immune 
Systems, and more broadly to the field of Computational Intelligence. 
The Dendritic Cell Algorithm is the basis for extensions such as the 
Deterministic Dendritic Cell Algorithm (dDCA) [2]. It is generally 
related to other Artificial Immune System algorithms such as the Clonal 
Selection Algorithm (Section 7.2), and the Immune Network Algorithm 
(Section 7.5). 

7.6.2 Inspiration 

The Dendritic Cell Algorithm is inspired by the Danger Theory of the 
mammalian immune system, and specifically the role and function of 
dendritic cells. The Danger Theory was proposed by Matzinger and 
suggests that the roles of the acquired immune system is to respond to 
signals of danger, rather than discriminating self from non-self [7, 8]. 
The theory suggests that antigen presenting cells (such as helper T- 
cells) activate an alarm signal providing the necessarily co-stimulation 
of antigen-specific cells to respond. Dendritic cells are a type of cell 
from the innate immune system that respond to some specific forms of 
danger signals. There are three main types of dendritic cells: 'immature' 
that collect parts of the antigen and the signals, 'semi-mature' that are 
immature cells that internally decide that the local signals represent safe 
and present the antigen to T-cells resulting in tolerance, and 'mature' 
cells that internally decide that the local signals represent danger and 
present the antigen to T-cells resulting in a reactive response. 

7.6.3 Strategy 

The information processing objective of the algorithm is to prepare a 
set of mature dendritic cells (prototypes) that provide context specific 
information about how to classify normal and anomalous input patterns. 
This is achieved as a system of three asynchronous processes of 1) 
migrating sufficiently stimulated immature cells, 2) promoting migrated 
cells to semi-mature (safe) or mature (danger) status depending on 
their accumulated response, and 3) labeling observed patterns as safe or 
dangerous based on the composition of the sub-population of cells that 
respond to each pattern. 
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7.6.4 Procedure 

Algorithm 7.6.1 provides pseudocode for training a pool of cells in 
the Dendritic Cell Algorithm, specifically the Deterministic Dendritic 
Cell Algorithm. Mature migrated cells associate their collected input 
patterns with anomalies, whereas semi-mature migrated cells associate 
their collected input patterns as normal. The resulting migrated cells 
can then be used to classify input patterns as normal or anomalous. This 
can be done through sampling the cells and using a voting mechanism, 
or more elaborate methods such as a 'mature context antigen value' 
(MCAV) that uses ^ (where M is the number of mature cells with 
the antigen and Ag is the sum of the exposures to the antigen by those 
mature cells) , which gives a probability of a pattern being an anomaly. 

7.6.5 Heuristics 

• The Dendritic Cell Algorithm is not specifically a classification 
algorithm, it may be considered a data filtering method for use in 
anomaly detection problems. 

• The canonical algorithm is designed to operate on a single discrete, 
categorical or ordinal input and two probabilistic specific signals 
indicating the heuristic danger or safety of the input. 

• The danger and safe signals are problem specific signals of the 
risk that the input pattern is an anomaly or is normal, both 
typically € [0,100]. 

• The danger and safe signals do not have to be reciprocal, meaning 
they may provide conflicting information. 

• The system was designed to be used in real-time anomaly detection 
problems, not just static problem. 

• Each cells migration threshold is set separately, typically e [5, 15] 

7.6.6 Code Listing 

Listing 7.5 provides an example of the Dendritic Cell Algorithm im- 
plemented in the Ruby Programming Language, specifically the Deter- 
ministic Dendritic Cell Algorithm (dDCA). The problem is a contrived 
anomaly-detection problem with ordinal inputs x € [0, 50) , where 
values that divide by 10 with no remainder are considered anomalies. 
Probabilistic safe and danger signal functions are provided, suggest- 
ing danger signals correctly with P(danger) — 0.70, and safe signals 
correctly with P(safe) — 0.95. 
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Algorithm 7.6.1: Pseudocode for the Dendritic Cell Algorithm. 

Input: InputPatterns, iterations max , cells num , 

MigrationThreshbounds 
Output: Migrated Cells 

1 ImmatureCells <— InitializeCells (cells num , 
MigrationThreshbounds) ; 

2 MigratedCells <- 0; 

3 for i — 1 to iterations^ 



do 



Pj <— SelectlnputPattern(lnputPatterns) ; 

^ (Pidanger 2 X Pisafe)-: 
CUISi 4 (Pidanger ~t~ Pisafe)? 

foreach Ce^ € ImmatureCells do 

UpdateCellOutputSignals (Celli, ki, cmsO; 
StoreAntigenCCe^, Pi an tigen) ] 
if Celliufespan < 0 then 

| RelnitializeCell (Celli) 
else if Celli csm > Cellithresh then 
RemoveCell( ImmatureCells, Celli) ; 
ImmatureCells <— 

CreateNewCell (MigrationThreshb OU nds) ; 
if Cellik < 0 then 
| Cellitype <— Mature; 



else 
end 

MigratedCells 



Semimature; 

- Celli\ 



end 



end 

23 end 

24 return I 



ligratedCells; 



The algorithm is an implementation of the Deterministic Dendritic 
Cell Algorithm (dDCA) as described in [2, 9], with verification from [5]. 
The algorithm was designed to be executed as three asynchronous pro- 
cesses in a real-time or semi-real time environment. For demonstration 
purposes, the implementation separated out the three main processes 
and executed the sequentially as a training and cell promotion phase 
followed by a test (labeling phase) . 

def rand_in_bounds(min, max) 

return min + ( (max-min) * randO) 
end 

def random_vector (search_space) 
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return Array .new(search_space . size) do |i| 

rand_in_bounds(search_space[i] [0] , search_space [i] [1]) 
end 
end 

def construct_pattern(class_label, domain, p_safe, p_danger) 
set = domain [class_label] 
selection = rand(set . size) 
pattern = {} 

pattern [:class_label] = class_label 
pattern [: input] = set [selection] 
pattern [: safe] = (randQ * p_safe * 100) 
pattern [: danger] = (rand() * p_danger * 100) 
return pattern 
end 

def generate_pattern(domain, p_anomaly, p_normal, prob_create_anom=0 . 5) 
pattern = nil 

if randO < prob_create_anom 

pattern = construct_pattern( "Anomaly " , domain, 1 . 0-p_normal , 
p_anomaly) 

puts ">Generated Anomaly [#{pattern [: input] }] " 
else 

pattern = construct_pattern("Normal" , domain, p_normal, 
1 . 0-p_anomaly) 

end 

return pattern 
end 

def initialize_cell(thresh, cell=0) 
cell [: lifespan] = 1000.0 
cell[:k] = 0.0 
cell [: cms] =0.0 

cell [ :migration_threshold] = rand_in_bounds (thresh [0] , thresh[l]) 
cell [: antigen] = {} 
return cell 
end 

def store_antigen(cell , input) 
if cell [: antigen] [input] .nil? 

cell [: antigen] [input] = 1 
else 

cell [: antigen] [input] += 1 
end 
end 

def expose_cell(cell, cms, k, pattern, threshold) 
cell [: cms] += cms 
cell[:k] += k 
cell [: lifespan] -= cms 
store_antigen(cell, pattern [: input] ) 

initialize_cell (threshold, cell) if cell [: lifespan] <= 0 
end 

def can_cell_migrate? (cell) 

return (cell [: cms] >=cell [ :migration_threshold] and 
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! cell [ : antigen] . empty?) 

end 

def expose_all_cells (cells , pattern, threshold) 
migrate = [] 

cms = (pattern [: safe] + pattern [: danger] ) 

k = pattern [: danger] - (pattern [: safe] * 2.0) 

cells. each do I cell I 

expose_cell(cell, cms, k, pattern, threshold) 

if can_cell_migrate? (cell) 
migrate << cell 

cell[:class_label] = (cell[:k]>0) ? "Anomaly" : "Normal" 
end 
end 

return migrate 
end 

def train_system(domain, max_iter, num_cells, p_anomaly, p_normal, 
thresh) 

immature_cells = Array .new(num_cells){ initialize_cell (thresh) } 

migrated = [] 

max_iter .times do I iter I 

pattern = generate_pattern (domain, p_anomaly, p_normal) 
migrants = expose_all_cells (immature_cells , pattern, thresh) 
migrants . each do I cell I 

immature_cells . delete (cell) 
immature_cells « initialize_cell (thresh) 
migrated << cell 
end 

puts "> iter=#{iter} new=#{migrants . size} , 
migrated=#{migrated. size}" 

end 

return migrated 
end 

def classif y_pattern (migrated, pattern) 
input = pattern [: input] 
num_cells, num_antigen = 0, 0 
migrated . each do I cell I 

if cell [ : class_label] == "Anomaly" and ! cell [: antigen] [input] .nil? 
num_cells += 1 

num_antigen += cell [: antigen] [input] 
end 
end 

mcav = num_cells . to_f / num_antigen.to_f 
return (mcav>0.5) ? "Anomaly" : "Normal" 
end 

def test_system(migrated, domain, p_anomaly, p_normal, num_trial=100) 
correct_norm = 0 
num_trial. times do 

pattern = construct_pattern( "Normal" , domain, p_normal, 

1 . 0-p_anomaly) 
class_label = classif y_pattern (migrated, pattern) 
correct_norm += 1 if class_label == "Normal" 
end 
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puts "Finished testing Normal inputs #-Ccorrect_norm}/#{num_trial}" 
correct_anom = 0 
num_trial . times do 

pattern = construct_pattern("Anomaly" , domain, 1 . 0-p_normal , 
p_anomaly) 

class_label = classif y_pattern(migrated, pattern) 
correct_anom += 1 if class_label == "Anomaly" 
end 

puts "Finished testing Anomaly inputs #{correct_anom}/#{num_trial}" 
return [correct_norm, correct_anom] 
end 

def execute (domain, max_iter, num_cells, p_anom, p_norm, thresh) 
migrated=train_system(domain, max_iter, num_cells, p_anom, p_norm, 
thresh) 

test_system (migrated, domain, p_anom, p_norm) 
return migrated 
end 

if __FILE__ == $0 

# problem configuration 
domain = {} 

domain ["Normal"] = Array .new(50) { I i I i} 

domain ["Anomaly"] = Array .new (5) { I i I (i+l)*10} 

domain ["Normal"] = domain ["Normal"] - domain ["Anomaly"] 

p_anomaly = 0 . 70 

p_normal = 0 . 95 

# algorithm configuration 
iterations = 100 
num_cells = 10 

thresh = [5,15] 

# execute the algorithm 

execute (domain, iterations, num_cells, p_anomaly, p_normal, thresh) 
end 



Listing 7.5: Deterministic Dendritic Cell Algorithm in Ruby 



7.6.7 References 
Primary Sources 

The Dendritic Cell Algorithm was proposed by Greensmith, Aickelin 
and Cayzer describing the inspiring biological system and providing 
experimental results on a classification problem [4], This work was 
followed shortly by a second study into the algorithm by Greensmith, 
Twycross, and Aickelin, focusing on computer security instances of 
anomaly detection and classification problems [6] . 

Learn More 

The Dendritic Cell Algorithm was the focus of Greensmith's thesis, 
which provides a detailed discussion of the methods abstraction from the 
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inspiring biological system, and a review of the technique's limitations 
[1]. A formal presentation of the algorithm is provided by Greensmith et 
al. [5]. Greensmith and Aickelin proposed the Deterministic Dendritic 
Cell Algorithm (dDCA) that seeks to remove some of the stochastic 
decisions from the method, and reduce the complexity and to make 
it more amenable to analysis [2]. Stibor et al. provide a theoretical 
analysis of the Deterministic Dendritic Cell Algorithm, considering 
the discrimination boundaries of single dendrite cells in the system [9]. 
Greensmith and Aickelin provide a detailed overview of the Dendritic 
Cell Algorithm focusing on the information processing principles of the 
inspiring biological systems as a book chapter [3]. 
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Neural Algorithms 



8.1 Overview 

This chapter describes Neural Algorithms. 

8.1.1 Biological Neural Networks 

A Biological Neural Network refers to the information processing ele- 
ments of the nervous system, organized as a collection of neural cells, 
called neurons, that are interconnected in networks and interact with 
each other using electrochemical signals. A biological neuron is generally 
comprised of an axon which provides the input signals and is connected 
to other neurons via synapses. The neuron reacts to input signals 
and may produce an output signal on its output connection called the 
dendrites. 

The study of biological neural networks falls within the domain of 
neuroscience which is a branch of biology concerned with the nervous 
system. Neuroanatomy is a subject that is concerned with the the 
structure and function of groups of neural networks both with regard 
to parts of the brain and the structures that lead from and to the 
brain from the rest of the body. Neuropsychology is another discipline 
concerned with the structure and function of the brain as they relate 
to abstract psychological behaviors. For further information, refer to a 
good textbook on any of these general topics. 

8.1.2 Artificial Neural Networks 

The field of Artificial Neural Networks (ANN) is concerned with the 
investigation of computational models inspired by theories and obser- 
vation of the structure and function of biological networks of neural 
cells in the brain. They are generally designed as models for addressing 
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mathematical, computational, and engineering problems. As such, there 
is a lot of interdisciplinary research in mathematics, neurobiology and 
computer science. 

An Artificial Neural Network is generally comprised of a collection 
of artificial neurons that are interconnected in order to performs some 
computation on input patterns and create output patterns. They are 
adaptive systems capable of modifying their internal structure, typically 
the weights between nodes in the network, allowing them to be used 
for a variety of function approximation problems such as classification, 
regression, feature extraction and content addressable memory 

Given that the focus of the field is on performing computation 
with networks of discrete computing units, the field is traditionally 
called a 'connectionist' paradigm of Artificial Intelligence and 'Neural 
Computation'. 

There are many types of neural networks, many of which fall into 
one of two categories: 

• Feed- forward Networks where input is provided on one side 
of the network and the signals are propagated forward (in one 
direction) through the network structure to the other side where 
output signals are read. These networks may be comprised of 
one cell, one layer or multiple layers of neurons. Some examples 
include the Perceptron, Radial Basis Function Networks, and the 
multi-layer perceptron networks. 

• Recurrent Networks where cycles in the network are permitted 
and the structure may be fully interconnected. Examples include 
the Hopfield Network and Bidirectional Associative Memory. 

Artificial Neural Network structures are made up of nodes and 
weights which typically require training based on samples of patterns 
from a problem domain. Some examples of learning strategies include: 

• Supervised Learning where the network is exposed to the input 
that has a known expected answer. The internal state of the 
network is modified to better match the expected result. Examples 
of this learning method include the Back-propagation algorithm 
and the Hebb rule. 

• Unsupervised Learning where the network is exposed to input 
patterns from which it must discern meaning and extract features. 
The most common type of unsupervised learning is competitive 
learning where neurons compete based on the input pattern to pro- 
duce an output pattern. Examples include Neural Gas, Learning 
Vector Quantization, and the Self-Organizing Map. 
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Artificial Neural Networks are typically difficult to configure and 
slow to train, but once prepared are very fast in application. They 
are generally used for function approximation-based problem domains 
and prized for their capabilities of generalization and tolerance to noise. 
They are known to have the limitation of being opaque, meaning there 
is little explanation to the subject matter expert as to why decisions 
were made, only how. 

There are many excellent reference texts for the field of Artificial 
Neural Networks, some selected texts include: "Neural Networks for Pat- 
tern Recognition" by Bishop [1], " Neural Smithing: Supervised Learning 
in Feedforward Artificial Neural Networks" by Reed and Marks II [8] 
and 11 An Introduction to Neural Networks" by Gurney [2] . 

8.1.3 Extensions 

There are many other algorithms and classes of algorithm that were not 
described from the field of Artificial Neural Networks, not limited to: 

• Radial Basis Function Network: A network where activation 
functions are controlled by Radial Basis Functions [4] . 

• Neural Gas: Another self-organizing and unsupervised compet- 
itive learning algorithm. Unlike SOM (and more like LVQ), the 
nodes are not organized into a lower-dimensional structure, instead 
the competitive Hebbian-learning like rule is applied to connect, 
order, and adapt nodes in feature space [5-7]. 

• Hierarchical Temporal Memory: A neural network system 
based on models of some of the structural and algorithmic proper- 
ties of the neocortex [3]. 
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8.2 Perceptron 

Perceptron. 

8.2.1 Taxonomy 

The Perceptron algorithm belongs to the field of Artificial Neural Net- 
works and more broadly Computational Intelligence. It is a single layer 
feedforward neural network (single cell network) that inspired many 
extensions and variants, not limited to ADALINE and the Widrow-Hoff 
learning rules. 

8.2.2 Inspiration 

The Perceptron is inspired by the information processing of a single 
neural cell (called a neuron). A neuron accepts input signals via its axon, 
which pass the electrical signal down to the cell body. The dendrites 
carry the signal out to synapses, which are the connections of a cell's 
dendrites to other cell's axons. In a synapse, the electrical activity is 
converted into molecular activity (neurotransmitter molecules crossing 
the synaptic cleft and binding with receptors). The molecular binding 
develops an electrical signal which is passed onto the connected cells 
axon. 

8.2.3 Strategy 

The information processing objective of the technique is to model a given 
function by modifying internal weightings of input signals to produce 
an expected output signal. The system is trained using a supervised 
learning method, where the error between the system's output and a 
known expected output is presented to the system and used to modify 
its internal state. State is maintained in a set of weightings on the input 
signals. The weights are used to represent an abstraction of the mapping 
of input vectors to the output signal for the examples that the system 
was exposed to during training. 

8.2.4 Procedure 

The Perceptron is comprised of a data structure (weights) and separate 
procedures for training and applying the structure. The structure is 
really just a vector of weights (one for each expected input) and a bias 
term. 

Algorithm 8.6.1 provides a pseudocode for training the Perceptron. 
A weight is initialized for each input plus an additional weight for a 
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fixed bias constant input that is almost always set to 1.0. The activation 
of the network to a given input pattern is calculated as follows: 

n 

activation <— (wh x Xki) + wu as x 1.0 (8.1) 
fc=i 

where n is the number of weights and inputs, Xki is the k th attribute 
on the i th input pattern, and wuas is the bias weight. The weights are 
updated as follows: 



Wi(t + 1) = Wi(t) + ax (e(t) - a(t)) x Xi(t) (8.2) 

where Wi is the i th weight at time t and t + 1, a is the learning rate, 
e(t) and a(t) are the expected and actual output at time t, and Xi is 
the i th input. This update process is applied to each weight in turn (as 
well as the bias weight with its contact input). 



Algorithm 8.2.1: Pseudocode for the Perceptron. 
Input: ProblemSize, InputPatterns, iterations max , learn rate 
Output: Weights 

1 Weights <— InitializeWeights (ProblemSize); 

2 for i = 1 to iterations max do 

3 Patterrii <— SelectlnputPattern(lnputPatterns) ; 

4 Activation;, <— ActivateNetwork(Fattern i; Weights); 

5 Outputs <— Transf erAct±va.t±on{Activationi) ; 

6 UpdateWeights {Patterrii, Output^, learn rate )] 

7 end 

8 return Weights; 



8.2.5 Heuristics 

• The Perceptron can be used to approximate arbitrary linear func- 
tions and can be used for regression or classification problems. 

• The Perceptron cannot learn a non-linear mapping between the 
input and output attributes. The XOR problem is a classical 
example of a problem that the Perceptron cannot learn. 

• Input and output values should be normalized such that x £ [0, 1). 

• The learning rate (a G [0, 1]) controls the amount of change each 
error has on the system, lower learning rages are common such as 
0.1. 
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• The weights can be updated in an online manner (after the expo- 
sure to each input pattern) or in batch (after a fixed number of 
patterns have been observed). 

• Batch updates are expected to be more stable than online updates 
for some complex problems. 

• A bias weight is used with a constant input signal to provide 
stability to the learning process. 

• A step transfer function is commonly used to transfer the activation 
to a binary output value 1 activation > 0, otherwise 0. 

• It is good practice to expose the system to input patterns in a 
different random order each enumeration through the input set. 

• The initial weights are typically small random values, typically 

e [o,o.5]. 
8.2.6 Code Listing 

Listing 8.1 provides an example of the Perceptron algorithm implemented 
in the Ruby Programming Language. The problem is the classical 
OR boolean problem, where the inputs of the boolean truth table are 
provided as the two inputs and the result of the boolean OR operation 
is expected as output. 

The algorithm was implemented using an online learning method, 
meaning the weights are updated after each input pattern is observed. 
A step transfer function is used to convert the activation into a binary 
output € {0, 1}. Random samples are taken from the domain to train 
the weights, and similarly, random samples are drawn from the domain 
to demonstrate what the network has learned. A bias weight is used for 
stability with a constant input of 1.0. 

def random_vector (miiimax) 

return Array .newCminmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randO) 

end 
end 

def initialize_weights(problem_size) 

minmax = Array. new (problem_size + 1) {[-1.0,1.0]} 

return random_vector (minmax) 
end 

def update_weights (num_inputs , weights, input, out_exp, out_act, l_rate) 
num_ inputs . times do |i| 

weights [i] += l_rate * (out_exp - out_act) * input [i] 
end 

weights [num_inputs] += l_rate * (out_exp - out_act) * 1.0 
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end 

def activate (weights , vector) 

sum = weights [weights . size-1] * 1.0 
vector . each_with_index do I input , i I 

sum += weights [i] * input 
end 

return sum 
end 

def transfer (activation) 

return (activation >= 0) ? 1.0 : 0.0 
end 

def get_output (weights , vector) 

activation = activate (weights , vector) 

return transfer (activation) 
end 

def train_weights (weights , domain, num_inputs, iterations, Irate) 
iterations . times do I epoch I 
error =0.0 

domain. each do I pattern I 

input = Ar ray. new (num_ inputs) {|k| pattern [k] . to_f} 
output = get_output (weights, input) 
expected = pattern. last . to_f 
error += (output - expected) . abs 

update_weights(num_inputs, weights, input, expected, output, Irate) 
end 

puts "> epoch=#{epoch}, error=#{error}" 
end 
end 

def test_weights (weights , domain, num_inputs) 
correct = 0 

domain. each do I pattern I 

input_vector = Array .new (num_ inputs) {|k| pattern [k] .to_f} 

output = get_output (weights , input_vector) 

correct += 1 if output. round == pattern. last 
end 

puts "Finished test with a score of #{correct}/#{domain. size}" 
return correct 
end 

def execute (domain, num_inputs, iterations, learning_rate) 
weights = initialize_weights(num_inputs) 

train_weights (weights, domain, num_inputs, iterations, learning_rate) 
test_weights (weights, domain, num_inputs) 
return weights 
end 

if __FILE__ == $0 

# problem configuration 

or.problem = [[0,0,0], [0,1,1], [1,0,1], [1,1,1]] 
inputs = 2 

# algorithm configuration 
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iterations = 20 

learning_rate =0.1 

# execute the algorithm 

execute (or_problem, inputs, iterations, learning_rate) 
end 

Listing 8.1: Perceptron in Ruby 

8.2.7 References 
Primary Sources 

The Perceptron algorithm was proposed by Rosenblatt in 1958 [3]. 
Rosenblatt proposed a range of neural network structures and methods. 
The 'Perceptron' as it is known is in fact a simplification of Rosenblatt's 
models by Minsky and Papert for the purposes of analysis [1]. An early 
proof of convergence was provided by Novikoff [2] . 

Learn More 

Minsky and Papert wrote the classical text titled "Perceptrons" in 1969 
that is known to have discredited the approach, suggesting it was limited 
to linear discrimination, which reduced research in the area for decades 
afterward [1]. 
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8.3 Back-propagation 

Back-propagation, Backpropagation, Error Back Propagation, Backprop, 
Delta-rule. 



8.3.1 Taxonomy 

The Back-propagation algorithm is a supervised learning method for 
multi-layer feed-forward networks from the field of Artificial Neural 
Networks and more broadly Computational Intelligence. The name 
refers to the backward propagation of error during the training of 
the network. Back-propagation is the basis for many variations and 
extensions for training multi-layer feed-forward networks not limited to 
Vogl's Method (Bold Drive), Delta-Bar-Delta, Quickprop, and Rprop. 

8.3.2 Inspiration 

Feed-forward neural networks are inspired by the information processing 
of one or more neural cells (called a neuron). A neuron accepts input 
signals via its axon, which pass the electrical signal down to the cell 
body. The dendrites carry the signal out to synapses, which are the 
connections of a cell's dendrites to other cell's axons. In a synapse, the 
electrical activity is converted into molecular activity (neurotransmitter 
molecules crossing the synaptic cleft and binding with receptors). The 
molecular binding develops an electrical signal which is passed onto the 
connected cells axon. The Back-propagation algorithm is a training 
regime for multi-layer feed forward neural networks and is not directly 
inspired by the learning processes of the biological system. 

8.3.3 Strategy 

The information processing objective of the technique is to model a given 
function by modifying internal weightings of input signals to produce 
an expected output signal. The system is trained using a supervised 
learning method, where the error between the system's output and a 
known expected output is presented to the system and used to modify 
its internal state. State is maintained in a set of weightings on the input 
signals. The weights are used to represent an abstraction of the mapping 
of input vectors to the output signal for the examples that the system 
was exposed to during training. Each layer of the network provides an 
abstraction of the information processing of the previous layer, allowing 
the combination of sub-functions and higher order modeling. 
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8.3.4 Procedure 

The Back-propagation algorithm is a method for training the weights 
in a multi-layer feed-forward neural network. As such, it requires a 
network structure to be defined of one or more layers where one layer 
is fully connected to the next layer. A standard network structure is 
one input layer, one hidden layer, and one output layer. The method is 
primarily concerned with adapting the weights to the calculated error 
in the presence of input patterns, and the method is applied backward 
from the network output layer through to the input layer. 

Algorithm 8.6.1 provides a high-level pseudocode for preparing a 
network using the Back-propagation training method. A weight is 
initialized for each input plus an additional weight for a fixed bias 
constant input that is almost always set to 1.0. The activation of a 
single neuron to a given input pattern is calculated as follows: 

activation = I Wk x Xki J + wuas X 1.0 (8-3) 
^ k=l ' 

where n is the number of weights and inputs, Xki is the k th attribute 
on the i th input pattern, and wu as is the bias weight. A logistic transfer 
function (sigmoid) is used to calculate the output for a neuron G [0, 1] 
and provide nonlinearities between in the input and output signals: 
\+exp(-a) i wnere a represents the neuron activation. 

The weight updates use the delta rule, specifically a modified delta 
rule where error is backwardly propagated through the network, starting 
at the output layer and weighted back through the previous layers. The 
following describes the back-propagation of error and weight updates 
for a single pattern. 

An error signal is calculated for each node and propagated back 
through the network. For the output nodes this is the sum of the error 
between the node outputs and the expected outputs: 

esi = - Oj) x tdi (8.4) 

where est is the error signal for the i th node, Cj is the expected 
output and Oi is the actual output for the i th node. The td term is the 
derivative of the output of the i th node. If the sigmod transfer function 
is used, tdi would be Oj x (1 — Oj) For the hidden nodes, the error signal 
is the sum of the weighted error signals from the next layer. 

esi = f ^(u>ifc x es k ) j x tdi (8.5) 

^ k=l ' 

where esi is the error signal for the i th node, Wik is the weight 
between the i th and the k th nodes, and esk is the error signal of the kth 
node. 
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The error derivatives for each weight are calculated by combining 
the input to each node and the error signal for the node. 



where edi is the error derivative for the i th node, esi is the error 
signal for the i th node and Xk is the input from the k th node in the 
previous layer. This process include the bias input that has a constant 
value. 

Weights are updated in a direction that reduces the error derivative 
edi (error assigned to the weight), metered by a learning coefficient. 



where Wi(t + 1) is the updated i th weight, edk is the error derivative 
for the k th node and learn rate is an update coefficient parameter. 



Algorithm 8.3.1: Pseudocode for Back-propagation. 

Input: ProblemSize, InputPatterns, iterations max , learn ra te 
Output: Network 

1 Network <— ConstructNetworkLayers () ; 

2 Network we i g hts 4— InitializeWeights (Network, ProblemSize); 

3 for i = 1 to iterations max do 

4 Patterrii <— SelectlnputPattern(lnputPatterns) ; 

5 Outputi 4— ForwardPropagate (Patterrii, Network); 

6 BackwardPropagateError (Patterrii, Outputi, Network); 

7 UpdateWeights (Pattern^ Outputi, Network, learn rate ); 

8 end 

9 return Network; 



8.3.5 Heuristics 

• The Back-propagation algorithm can be used to train a multi-layer 
network to approximate arbitrary non-linear functions and can be 
used for regression or classification problems. 

• Input and output values should be normalized such that x G [0, 1). 

• The weights can be updated in an online manner (after the expo- 
sure to each input pattern) or in batch (after a fixed number of 
patterns have been observed). 

• Batch updates are expected to be more stable than online updates 
for some complex problems. 



n 




(8.6) 



w l (t + 1) = Wi(t) + (ed k x learnrate) 



(8.7) 
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• A logistic (sigmoid) transfer function is commonly used to transfer 
the activation to a binary output value, although other transfer 
functions can be used such as the hyperbolic tangent (tanh), 
Gaussian, and softmax. 

• It is good practice to expose the system to input patterns in a 
different random order each enumeration through the input set. 

• The initial weights are typically small random values <E [0, 0.5]. 

• Typically a small number of layers are used such as 2-4 given that 
the increase in layers result in an increase in the complexity of the 
system and the time required to train the weights. 

• The learning rate can be varied during training, and it is common 
to introduce a momentum term to limit the rate of change. 

• The weights of a given network can be initialized with a global opti- 
mization method before being refined using the Back-propagation 
algorithm. 

• One output node is common for regression problems, where as one 
output node per class is common for classification problems. 

8.3.6 Code Listing 

Listing 8.2 provides an example of the Back-propagation algorithm 
implemented in the Ruby Programming Language. The problem is the 
classical XOR boolean problem, where the inputs of the boolean truth 
table are provided as inputs and the result of the boolean XOR operation 
is expected as output. This is a classical problem for Back-Propagation 
because it was the problem instance referenced by Minsky and Papert 
in their analysis of the Perceptron highlighting the limitations of their 
simplified models of neural networks [3] . 

The algorithm was implemented using a batch learning method, 
meaning the weights are updated after each epoch of patterns are 
observed. A logistic (sigmoid) transfer function is used to convert the 
activation into an output signal. Weight updates occur at the end of 
each epoch using the accumulated delta's. A momentum term is used 
in conjunction with the past weight update to ensure the last update 
influences the current update, reducing large changes. 

A three layer network is demonstrated with 2 nodes in the input 
layer (two inputs), 2 nodes in the hidden layer and 1 node in the output 
layer, which is sufficient for the chosen problem. A bias weight is used 
on each neuron for stability with a constant input of 1.0. The learning 
process is separated into four steps: forward propagation, backward 
propagation of error, calculation of error derivatives (assigning blame 
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to the weights) and the weight update. This separation facilities easy 
extensions such as adding a momentum term and/or weight decay to 
the update process. 

def random_vector (minmax) 

return Array . new (minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randQ) 

end 
end 

def initialize_weights (num_weights) 

minmax = Array .new(num_weights) { [-randO ,rand()] } 

return random_vector (minmax) 
end 

def activate (weights , vector) 

sum = weights [weights . size-1] * 1.0 
vector .each_with_index do I input, i| 

sum += weights [i] * input 
end 

return sum 
end 

def transfer (activation) 

return 1.0 / (1.0 + Math. exp(-activation) ) 
end 

def transf er_derivative(output) 

return output * (1.0 - output) 
end 

def f orward_propagate (net , vector) 
net . each_with_index do I layer, i| 
input=(i==0)? vector : 

Array .new(net [i-1] . size) { I k I net [i-1] [k] [:output]} 
layer. each do I neuron I 

neuron [: activation] = activate (neuron [: weights] , input) 
neuron [: output] = transf er (neuron [: activation] ) 
end 
end 

return net . last [0] [ : output] 
end 

def backward_propagate_error (network, expected_output) 
network . size .times do I n I 

index = network. size - 1 - n 
if index == network . size-1 

neuron = network [index] [0] # assume one node in output layer 
error = (expected_output - neuron [: output] ) 

neuron [: delta] = error * transf er_derivative (neuron [: output] ) 
else 

network [index] . each_with_index do I neuron, k| 
sum =0.0 

# only sum errors weighted by connection to the current k'th 
neuron 

network [index+1] . each do I next_neuron I 
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sum += (next_neuron [: weights] [k] * next_neuron [: delta] ) 
end 

neuron [: delta] = sum * transf er_derivative (neuron [: output] ) 
end 
end 
end 
end 

def calculate_error_derivatives_for_weights(net, vector) 
net . each_with_index do I layer, 1| 
input=(i==0)? vector : 

Array .new (net [i-1] . size){ |k|net [i-1] [k] [: output] } 
layer. each do I neuron I 

input . each_with_index do I signal, j| 

neuron [:deriv] [j] += neuron [: delta] * signal 
end 

neuron [:deriv] [-1] += neuron [: delta] * 1.0 
end 
end 
end 

def update_weights (network, Irate, mom=0.8) 
network . each do I layer I 
layer. each do I neuron I 

neuron [: weights] . each_with_index do |w, j| 

delta = (Irate * neuron [: deriv] [j] ) + (neuron [: last_delta] [j] * 
mom) 

neuron [: weights] [j] += delta 
neuron [: last_delta] [j] = delta 
neuron [: deriv] [j] =0.0 
end 
end 
end 
end 

def train_network(network, domain, num_inputs, iterations, Irate) 
correct = 0 

iterations . times do I epoch I 
domain. each do I pattern I 

vector,expected=Array.new(num_inputs){ I k I pattern [k] . to_f }, pattern, 
output = forward_propagate (network, vector) 
correct += 1 if output. round == expected 
backward_propagate_error (network , expected) 
calculate_error_derivatives_for_weights (network, vector) 
end 

update .weights (network, Irate) 
if (epoch+1) .modulo(lOO) == 0 
puts "> epoch=#{epoch+l} , Correct=#{correct}/#{100*domain. size}" 
correct = 0 
end 
end 
end 

def test_network (network, domain, num_inputs) 
correct = 0 

domain. each do I pattern I 
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input_vector = Array .new (num_inputs) {|k| pattern [k] .to_f} 
output = forward_propagate (network, input_vector) 
correct += 1 if output. round == pattern. last 
end 

puts "Finished test with a score of #{correct}/#{domain. length}" 
return correct 
end 

def create_neuron(num_inputs) 

return { : weights=>initialize_weights (num_inputs+l) , 
: last_delta=>Array .new(num_inputs+l) {0 . 0} , 
: deriv=>Array .new(num_inputs+l) {0 . 0}} 

end 

def execute (domain, num_inputs, iterations, num_nodes, Irate) 
network = [] 

network << Array .new(num_nodes){create_neuron(num_inputs)} 
network << Array .new(l){create_neuron(network. last . size) } 
puts "Topology: #{num_inputs} #{network. inject (""){ |m, i I m+" #{i . sin 
"}}" 

train_network (network, domain, num_inputs, iterations, Irate) 
test_network(network, domain, num_inputs) 
return network 
end 

if __FILE__ == $0 

# problem configuration 

xor = [[0,0,0], [0,1,1], [1,0,1], [1,1,0]] 
inputs = 2 

# algorithm configuration 
learning_rate = 0.3 
num_hidden_nodes = 4 
iterations = 2000 

# execute the algorithm 

execute(xor, inputs, iterations, num_hidden_nodes , learning_rate) 
end 



Listing 8.2: Back-propagation in Ruby 



8.3.7 References 
Primary Sources 

The backward propagation of error method is credited to Bryson and 
Ho in [1]. It was applied to the training of multi-layer networks and 
called back-propagation by Rumelhart, Hinton and Williams in 1986 
[5, 6]. This effort and the collection of studies edited by Rumelhart and 
McClelland helped to define the field of Artificial Neural Networks in 
the late 1980s [7, 8]. 
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Learn More 

A seminal book on the approach was "Backpropagation: theory, archi- 
tectures, and applications" by Chauvin and Rumelhart that provided 
an excellent introduction (chapter 1) but also a collection of studies 
applying and extending the approach [2]. Reed and Marks provide 
an excellent treatment of feed-forward neural networks called "Neural 
Smithing" that includes chapters dedicated to Back-propagation, the 
configuration of its parameters, error surface and speed improvements 
[4]- 
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8.4 Hopfield Network 

Hopfield Network, HN, Hopfield Model. 

8.4.1 Taxonomy 

The Hopfield Network is a Neural Network and belongs to the field of 
Artificial Neural Networks and Neural Computation. It is a Recurrent 
Neural Network and is related to other recurrent networks such as the 
Bidirectional Associative Memory (BAM) . It is generally related to feed- 
forward Artificial Neural Networks such as the Perceptron (Section 8.2) 
and the Back-propagation algorithm (Section 8.3). 

8.4.2 Inspiration 

The Hopfield Network algorithm is inspired by the associated memory 
properties of the human brain. 

8.4.3 Metaphor 

Through the training process, the weights in the network may be thought 
to minimize an energy function and slide down an energy surface. In 
a trained network, each pattern presented to the network provides an 
attractor, where progress is made towards the point of attraction by 
propagating information around the network. 

8.4.4 Strategy 

The information processing objective of the system is to associate the 
components of an input pattern with a holistic representation of the 
pattern called Content Addressable Memory (CAM). This means that 
once trained, the system will recall whole patterns, given a portion or a 
noisy version of the input pattern. 

8.4.5 Procedure 

The Hopfield Network is comprised of a graph data structure with 
weighted edges and separate procedures for training and applying the 
structure. The network structure is fully connected (a node connects to 
all other nodes except itself) and the edges (weights) between the nodes 
are bidirectional. 

The weights of the network can be learned via a one-shot method 
(one-iteration through the patterns) if all patterns to be memorized by 
the network are known. Alternatively, the weights can be updated incre- 
mentally using the Hebb rule where weights are increased or decreased 
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based on the difference between the actual and the expected output. 
The one-shot calculation of the network weights for a single node occurs 
as follows: 

N 

fc=i 

where Wij is the weight between neuron i and j, N is the number 
of input patterns, v is the input pattern and v\ is the i th attribute on 
the k th input pattern. 

The propagation of the information through the network can be 
asynchronous where a random node is selected each iteration, or syn- 
chronously, where the output is calculated for each node before being 
applied to the whole network. Propagation of the information continues 
until no more changes are made or until a maximum number of iterations 
has completed, after which the output pattern from the network can be 
read. The activation for a single node is calculated as follows: 

n 

rii — ^ Wij x rij (8.9) 

i=i 

where rn is the activation of the i th neuron, Wij with the weight 
between the nodes i and j, and nj is the output of the j th neuron. 
The activation is transferred into an output using a transfer function, 
typically a step function as follows: 

trans fer{rii) = 
where the threshold 9 is typically fixed at 0. 

8.4.6 Heuristics 

• The Hopficld network may be used to solve the recall problem of 
matching cues for an input pattern to an associated pre-learned 
pattern. 

• The transfer function for turning the activation of a neuron into 
an output is typically a step function /(a) € { — 1, 1} (preferred), 
or more traditionally /(a) € {0, 1}. 

• The input vectors are typically normalized to boolean values 
are [-1,1]. 

• The network can be propagated asynchronously (where a random 
node is selected and output generated), or synchronously (where 
the output for all nodes are calculated before being applied). 
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• Weights can be learned in a one-shot or incremental method based 
on how much information is known about the patterns to be 
learned. 

• All neurons in the network are typically both input and output 
neurons, although other network topologies have been investigated 
(such as the designation of input and output neurons) . 

• A Hopheld network has limits on the patterns it can store and 
retrieve accurately from memory, described by N < 0.15 x n where 
N is the number of patterns that can be stored and retrieved and 
n is the number of nodes in the network. 

8.4.7 Code Listing 

Listing 8.3 provides an example of the Hopheld Network algorithm 
implemented in the Ruby Programming Language. The problem is an 
instance of a recall problem where patters are described in terms of a 
3x3 matrix of binary values (<E {—1, 1})- Once the network has learned 
the patterns, the system is exposed to perturbed versions of the patterns 
(with errors introduced) and must respond with the correct pattern. 
Two patterns are used in this example, specifically 'T', and 'U'. 

The algorithm is an implementation of the Hopfield Network with a 
one-shot training method for the network weights, given that all patterns 
are already known. The information is propagated through the network 
using an asynchronous method, which is repeated for a fixed number of 
iterations. The patterns are displayed to the console during the testing 
of the network, with the outputs converted from { — 1, 1} to {0, 1} for 
readability. 

def random_vector (minmax) 

return Array . new (minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randQ) 

end 
end 

def initialize_weights (problem_size) 

minmax = Array .new(problem_size) {[-0.5,0.5]} 

return random_vector (minmax) 
end 

def create_neuron(num_inputs) 
neuron = O 

neuron [: weights] = initialize_weights (num_inputs) 
return neuron 
end 

def transfer (activation) 

return (activation >= 0) ? 1 : -1 
end 
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def propagate_was_change? (neurons) 
i = rand (neurons . size) 
activation = 0 

neurons . each_with_index do I other, j| 

activation += other [: weights] [i] *other [: output] if i!=j 
end 

output = transfer (activation) 
change = output != neurons [i] [: output] 
neurons [i] [: output] = output 
return change 
end 

def get_output (neurons , pattern, evals=100) 
vector = pattern. flatten 

neurons . each_with_index {|neuron,i| neuron [: output] = vector [i]} 
evals. times { propagate_was_change? (neurons) } 
return Array .new(neurons . size){ I i I neurons [i] [: output] } 
end 

def train_network (neurons , patters) 

neurons . each_with_index do I neuron, i| 
for j in ( (i+1) .. .neurons . size) do 
next if i==j 
wij = 0.0 

patters. each do I pattern I 

vector = pattern. flatten 

wij += vector [i] *vector [j] 
end 

neurons [i] [: weights] [j] = wij 
neurons [j] [:weights] [i] = wij 
end 
end 
end 

def to_binary (vector) 

return Array .new(vector . size) { I i I ( (vector [i] ==-1) ? 0 : 1)} 
end 

def print_patterns (provided, expected, actual) 

p, e, a = to_binary (provided) , to_binary (expected) , to_binary (actual) 
pi, p2, p3 = p[0. .2] .joinC , '), p[3. .5] .joinC , '), p[6. .8] . join( ' , 
') 

el, e2, e3 = e [0 . . 2] . j oin( ' , '), e[3. .5] . joinC , '), e [6. .8] . join( 1 , 
') 

al, a2, a3 = atO. .2] . joinC , '), a [3 . . 5] . join( ' , '), a[6. .8] . join( ' , 
') 

puts "Provided Expected Got" 
puts "#{pl} #{el} #{al}" 
puts "#{p2} #{e2} #{a2}" 
puts "#{p3} #{e3} #{a3}" 
end 

def calculate_error(expected, actual) 
sum = 0 

expected. each_with_index do |v, i| 
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sum += 1 if expected [i] ! =actual [i] 
end 

return sum 
end 

def perturb_pattern (vector , num_errors=l) 
perturbed = Array .new (vector) 
indicies = [rand (perturbed. size)] 
while indicies . size < num_errors do 
index = rand (perturbed. size) 

indicies << index if ! indicies . include?(index) 
end 

indicies. each {|i| perturbed[i] = ( (perturbed [i] ==1) ? -1 : 1)} 
return perturbed 
end 

def test_network (neurons, patterns) 
error = 0.0 

patterns . each do I pattern I 

vector = pattern. flatten 

perturbed = perturb_pattern(vector) 

output = get_output (neurons , perturbed) 

error += calculate_error (vector , output) 

print_patterns (perturbed, vector, output) 
end 

error = error / patterns . size . to_f 
puts "Final Result: avg pattern error=#{error}" 
return error 
end 

def execute (patters , num_inputs) 

neurons = Array .new (num_inputs) { create_neuron(num_inputs) } 

train_network (neurons, patters) 

test_network (neurons , patters) 

return neurons 
end 

if __FILE__ == $0 

# problem configuration 
num_ inputs = 9 

pi = [[1,1,1] , [-1,1,-1] , [-1,1,-1]] # T 
p2 = [[1,-1,1] , [1,-1,1] , [1,1,1]] # U 
patters = [pi, p2] 

# execute the algorithm 
execute (patters, num_inputs) 

end 



Listing 8.3: Hopfield Network in Ruby 



8.4.8 References 
Primary Sources 

The Hopfield Network was proposed by Hopfield in 1982 where the 
basic model was described and related to an abstraction of the inspiring 
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biological system [2]. This early work was extended by Hopfield to 
'graded' neurons capable of outputting a continuous value through 
use of a logistic (sigmoid) transfer function [3]. An innovative work 
by Hopfield and Tank considered the use of the Hopfield network for 
solving combinatorial optimization problems, with a specific study into 
the system applied to instances of the Traveling Salesman Problem [4]. 
This was achieved with a large number of neurons and a representation 
that decoded the position of each city in the tour as a sub-problem on 
which a customized network energy function had to be minimized. 

Learn More 

Popovici and Boncut provide a summary of the Hopfield Network al- 
gorithm with worked examples [5] . Overviews of the Hopfield Network 
are provided in most good books on Artificial Neural Networks, such 
as [6]. Hertz, Krogh, and Palmer present an in depth study of the field 
of Artificial Neural Networks with a detailed treatment of the Hopfield 
network from a statistical mechanics perspective [1]. 
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8.5 Learning Vector Quantization 

Learning Vector Quantization, LVQ. 

8.5.1 Taxonomy 

The Learning Vector Quantization algorithm belongs to the field of 
Artificial Neural Networks and Neural Computation. More broadly to the 
field of Computational Intelligence. The Learning Vector Quantization 
algorithm is a supervised neural network that uses a competitive (winner- 
take-all) learning strategy. It is related to other supervised neural 
networks such as the Perceptron (Section 8.2) and the Back-propagation 
algorithm (Section 8.3). It is related to other competitive learning neural 
networks such as the the Self-Organizing Map algorithm (Section 8.6) 
that is a similar algorithm for unsupervised learning with the addition 
of connections between the neurons. Additionally, LVQ is a baseline 
technique that was defined with a few variants LVQ1, LVQ2, LVQ2.1, 
LVQ3, OLVQ1, and OLVQ3 as well as many third-party extensions and 
refinements too numerous to list. 

8.5.2 Inspiration 

The Learning Vector Quantization algorithm is related to the Self- 
Organizing Map which is in turn inspired by the self-organizing capabil- 
ities of neurons in the visual cortex. 

8.5.3 Strategy 

The information processing objective of the algorithm is to prepare a 
set of codebook (or prototype) vectors in the domain of the observed 
input data samples and to use these vectors to classify unseen examples. 
An initially random pool of vectors is prepared which are then exposed 
to training samples. A winner-take-all strategy is employed where 
one or more of the most similar vectors to a given input pattern are 
selected and adjusted to be closer to the input vector, and in some cases, 
further away from the winner for runners up. The repetition of this 
process results in the distribution of codebook vectors in the input space 
which approximate the underlying distribution of samples from the test 
dataset. 

8.5.4 Procedure 

Vector Quantization is a technique from signal processing where density 
functions are approximated with prototype vectors for applications such 
as compression. Learning Vector Quantization is similar in principle, 
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although the prototype vectors are learned through a supervised winner- 
take-all method. 

Algorithm 8.6.1 provides a high-level pseudocode for preparing code- 
book vectors using the Learning Vector Quantization method. Codebook 
vectors are initialized to small floating point values, or sampled from 
an available dataset. The Best Matching Unit (BMU) is the codebook 
vector from the pool that has the minimum distance to an input vec- 
tor. A distance measure between input patterns must be defined. For 
real-valued vectors, this is commonly the Euclidean distance: 

n 

dist(x, c) = ^2(x l - a) 2 (8.10) 

where n is the number of attributes, x is the input vector and c is a 
given codebook vector. 



Algorithm 8.5.1: Pseudocode for LVQ1. 



Input: ProblemSize, InputPatterns, iterations ma x, 

CodebookVectors num , learn rate 
Output: CodebookVectors 
CodebookVectors 4— 

InitializeCodebookVectors (CodebookVectors num , 
ProblemSize) ; 

for i = 1 to iterationsmax do 

Patterni 4— SelectlnputPattern(lnputPatterns) ; 
Bmiii <— SelectBestMatchingUnit (Patterni, 
CodebookVectors) ; 



10 

n 



foreach Bmu 



attribute 



G Bmui do 



learn rate 

1 



if Bmuf ass = Patternf ass then 

J^jYl1l a t tribute ^ j^^^ai tribute _^ 

(Patter 'nf tribute _ Bmu attribute 

else 

Bmu attribute ^_ Bmu attrzbute _ l earUrate 

(Patter nf trlbute - Bmuf tribute ) 
end 
end 

12 end 

13 return CodebookVectors; 



8.5.5 Heuristics 

• Learning Vector Quantization was designed for classification prob- 
lems that have existing data sets that can be used to supervise the 
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learning by the system. The algorithm does not support regression 
problems. 

• LVQ is non-parametric, meaning that it does not rely on assump- 
tions about that structure of the function that it is approximating. 

• Real-values in input vectors should be normalized such that x £ 
[0,1). 

• Euclidean distance is commonly used to measure the distance 
between real-valued vectors, although other distance measures 
may be used (such as dot product), and data specific distance 
measures may be required for non-scalar attributes. 

• There should be sufficient training iterations to expose all the 
training data to the model multiple times. 

• The learning rate is typically linearly decayed over the training 
period from an initial value to close to zero. 

• The more complex the class distribution, the more codebook 
vectors that will be required, some problems may need thousands. 

• Multiple passes of the LVQ training algorithm are suggested for 
more robust usage, where the first pass has a large learning rate 
to prepare the codebook vectors and the second pass has a low 
learning rate and runs for a long time (perhaps 10-times more 
iterations) . 



8.5.6 Code Listing 

Listing 8.4 provides an example of the Learning Vector Quantization 
algorithm implemented in the Ruby Programming Language. The 
problem is a contrived classification problem in a 2-dimensional domain 
x e [0,1], y € [0,1] with two classes: 'A' (x € [0, 0.4999999], y e 
[0, 0.4999999]) and 'B' (x G [0.5, 1], y £ [0.5, 1]). 

The algorithm was implemented using the LVQ1 variant where the 
best matching codebook vector is located and moved toward the input 
vector if it is the same class, or away if the classes differ. A linear decay 
was used for the learning rate that was updated after each pattern was 
exposed to the model. The implementation can easily be extended to 
the other variants of the method. 



def random_vector (minmax) 

return Array . new (minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randQ) 

end 
end 
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def generate_random_pattern (domain) 
classes = domain. keys 
selected_class = rand(classes . size) 
pattern = { : label=>classes [selected_class] } 

pattern [ : vector] = random_vector (domain [classes [selected_class] ] ) 
return pattern 
end 

def initialize_vectors(domain, num_vectors) 
classes = domain. keys 
codebook_vectors = [] 
num_vectors .times do 

selected_class = rand(classes . size) 

codebook = {} 

codebook [: label] = classes [selected_class] 
codebook [: vector] = random_vector ( [ [0, 1] , [0 , 1] ] ) 
codebook_vectors << codebook 
end 

return codebook_vectors 
end 

def euclidean_distance (cl , c2) 
sum = 0.0 

cl.each_index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Math. sqrt (sum) 
end 

def get_best_matching_unit(codebook_vectors, pattern) 

best, b_dist = nil, nil 

codebook_vectors.each do I codebook I 
dist = euclidean_distance (codebook [: vector] , pattern [: vector] ) 
best,b_dist = codebook, dist if b_dist.nil? or dist<b_dist 

end 

return best 
end 

def update_codebook_vector (bmu, pattern, Irate) 
bmu [: vector] . each_with_index do |v,i| 

error = pattern [: vector] [i] -bmu [: vector] [i] 
if bmu [: label] == pattern [: label] 

bmu [: vector] [i] += Irate * error 
else 

bmu [: vector] [i] -= Irate * error 
end 
end 
end 

def train_network(codebook_vectors , domain, iterations, learning_rate) 
iterations . times do I iter I 

pat = generate_random_pattern (domain) 
bmu = get_best_matching_unit (codebook_vectors , pat) 
Irate = learning_rate * (1 . 0- (iter . to_f /iterations .to_f) ) 
if iter. modulo ( 10) ==0 

puts "> iter=#{iter} , got=#{bmu [: label] } , exp=#{pat [: label] }" 
end 

update_codebook_vector (bmu, pat, Irate) 
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end 
end 

def test_network(codebook_vectors, domain, num_trials=100) 

correct = 0 

num_trials . times do 

pattern = generate_random_pattern (domain) 

bmu = get_best_matching_unit (codebook_vectors , pattern) 

correct += 1 if bmu [: label] == pattern [: label] 

end 

puts "Done. Score: #{correct}/#{num_trials}" 
return correct 
end 

def execute (domain, iterations, num_vectors, learning_rate) 
codebook_vectors = initialize_vectors (domain, num_vectors) 
train_network(codebook_vectors, domain, iterations, learning_rate) 
test_network(codebook_vectors , domain) 
return codebook_vectors 

end 

if __FILE__ == $0 

# problem configuration 

domain = {"A"=> [ [0, 0 . 4999999] , [0,0.4999999]] , "B"=> [ [0 . 5 , 1] , [0.5,1]]} 

# algorithm configuration 
learning_rate = 0.3 
iterations = 1000 
num_vectors = 20 

# execute the algorithm 

execute (domain, iterations, num_vectors, learning_rate) 
end 

Listing 8.4: Learning Vector Quantization in Ruby 

8.5.7 References 
Primary Sources 

The Learning Vector Quantization algorithm was described by Kohonen 
in 1988 [2], and was further described in the same year by Kohonen [1] 
and benchmarked by Kohonen, Barna, and Chrisley [5] . 

Learn More 

Kohonen provides a detailed overview of the state of LVQ algorithms 
and variants (LVQ1, LVQ2, and LVQ2.1) [3], The technical report 
that comes with the LVQ_PAK software (written by Kohonen and his 
students) provides both an excellent summary of the technique and 
its main variants, as well as summarizing the important considerations 
when applying the approach [6]. The seminal book on Learning Vector 
Quantization and the Self-Organizing Map is "Self-Organizing Maps'' 
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by Kohonen, which includes a chapter (Chapter 6) dedicated to LVQ 
and its variants [4]. 

8.5.8 Bibliography 
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8.6 Self-Organizing Map 

Self- Organizing Map, SOM, Self- Organizing Feature Map, SOFM, Ko- 
honen Map, Kohonen Network. 

8.6.1 Taxonomy 

The Self-Organizing Map algorithm belongs to the held of Artihcial 
Neural Networks and Neural Computation. More broadly it belongs 
to the held of Computational Intelligence. The Self-Organizing Map 
is an unsupervised neural network that uses a competitive (winner- 
take-all) learning strategy. It is related to other unsupervised neural 
networks such as the Adaptive Resonance Theory (ART) method. It is 
related to other competitive learning neural networks such as the the 
Neural Gas Algorithm, and the Learning Vector Quantization algorithm 
(Section 8.5), which is a similar algorithm for classification without 
connections between the neurons. Additionally, SOM is a baseline 
technique that has inspired many variations and extensions, not limited 
to the Adaptive-Subspace Self-Organizing Map (ASSOM). 

8.6.2 Inspiration 

The Self-Organizing Map is inspired by postulated feature maps of 
neurons in the brain comprised of feature-sensitive cells that provide 
ordered projections between neuronal layers, such as those that may 
exist in the retina and cochlea. For example, there are acoustic feature 
maps that respond to sounds to which an animal is most frequently 
exposed, and tonotopic maps that may be responsible for the order 
preservation of acoustic resonances. 

8.6.3 Strategy 

The information processing objective of the algorithm is to optimally 
place a topology (grid or lattice) of codebook or prototype vectors in the 
domain of the observed input data samples. An initially random pool 
of vectors is prepared which are then exposed to training samples. A 
winner-take-all strategy is employed where the most similar vector to a 
given input pattern is selected, then the selected vector and neighbors of 
the selected vector are updated to closer resemble the input pattern. The 
repetition of this process results in the distribution of codebook vectors 
in the input space which approximate the underlying distribution of 
samples from the test dataset. The result is the mapping of the topology 
of codebook vectors to the underlying structure in the input samples 
which may be summarized or visualized to reveal topologically preserved 
features from the input space in a low-dimensional projection. 
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8.6.4 Procedure 

The Self- Organizing map is comprised of a collection of codebook vectors 
connected together in a topological arrangement, typically a one dimen- 
sional line or a two dimensional grid. The codebook vectors themselves 
represent prototypes (points) within the domain, whereas the topological 
structure imposes an ordering between the vectors during the training 
process. The result is a low dimensional projection or approximation of 
the problem domain which may be visualized, or from which clusters 
may be extracted. 

Algorithm 8.6.1 provides a high-level pseudocode for preparing code- 
book vectors using the Self-Organizing Map method. Codebook vectors 
are initialized to small floating point values, or sampled from the domain. 
The Best Matching Unit (BMU) is the codebook vector from the pool 
that has the minimum distance to an input vector. A distance measure 
between input patterns must be defined. For real-valued vectors, this is 
commonly the Euclidean distance: 

n 

dist(x, c) = 2^(xi — Ci) 2 (8-11) 

i=l 

where n is the number of attributes, x is the input vector and c is a 
given codebook vector. 

The neighbors of the BMU in the topological structure of the network 
are selected using a neighborhood size that is linearly decreased during 
the training of the network. The BMU and all selected neighbors are 
then adjusted toward the input vector using a learning rate that too is 
decreased linearly with the training cycles: 

Ci(t + 1) = learn rate (t) x (a(t) - Xi) (8.12) 

where Ci{t) is the i th attribute of a codebook vector at time t, 
learn rate is the current learning rate, an Xi is the i th attribute of a 
input vector. 

The neighborhood is typically square (called bubble) where all neigh- 
borhood nodes are updated using the same learning rate for the iteration, 
or Gaussian where the learning rate is proportional to the neighborhood 
distance using a Gaussian distribution (neighbors further away from the 
BMU are updated less). 

8.6.5 Heuristics 

• The Self-Organizing Map was designed for unsupervised learning 
problems such as feature extraction, visualization and clustering. 
Some extensions of the approach can label the prepared codebook 
vectors which can be used for classification. 
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Algorithm 8.6.1: Pseudocode for the SOM. 



Input: InputPatterns, iterations max , learn l ™t e i neighborhood 1 ™^, 

Grid W idth, Grid height 
Output: CodebookVectors 

1 CodebookVectors <— InitializeCodebookVectors (Grid^t?^ 
Grid he ig h t, InputPatterns); 

2 for i = 1 to iterations max do 

3 learn l rate <— CalculateLearningRate(i, Zearn™/ e ); 

4 neighborhood l size -J— CalculateNeighborhoodSize (i, 
neighborhood™^); 

5 Patterrii <— SelectlnputPattern(lnputPatterns) ; 

6 Bmui <— SelectBestMatchingUnit (Patterrii, 
CodebookVectors) ; 

7 Neighborhood <— Bmui, 

8 Neighborhood <— SelectNeighbors(Bmu i7 CodebookVectors, 

neighborhood l size ) ; 
foreach yector^ e Neighborhood do 

foreach V ectorf tribute £ Vector, do 



9 
10 
11 



Vector f trlbute <- Vectorf tnbute + learn l Tate x 



(Patter tribute _ y ector attributej 

end 

13 end 

14 end 

15 return CodebookVectors; 



• SOM is non-parametric, meaning that it does not rely on assump- 
tions about that structure of the function that it is approximating. 

• Real-values in input vectors should be normalized such that x G 
[0,1). 

• Euclidean distance is commonly used to measure the distance 
between real-valued vectors, although other distance measures 
may be used (such as dot product), and data specific distance 
measures may be required for non-scalar attributes. 

• There should be sufficient training iterations to expose all the 
training data to the model multiple times. 

• The more complex the class distribution, the more codebook 
vectors that will be required, some problems may need thousands. 

• Multiple passes of the SOM training algorithm are suggested for 
more robust usage, where the first pass has a large learning rate 
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to prepare the codebook vectors and the second pass has a low 
learning rate and runs for a long time (perhaps 10-times more 
iterations). 

• The SOM can be visualized by calculating a Unified Distance 
Matrix (U-Matrix) shows highlights the relationships between the 
nodes in the chosen topology. A Principle Component Analysis 
(PCA) or Sammon's Mapping can be used to visualize just the 
nodes of the network without their inter-relationships. 

• A rectangular 2D grid topology is typically used for a SOM, 
although toroidal and sphere topologies can be used. Hexagonal 
grids have demonstrated better results on some problems and grids 
with higher dimensions have been investigated. 

• The neuron positions can be updated incrementally or in a batch 
model (each epoch of being exposed to all training samples) . Batch- 
mode training is generally expected to result in a more stable 
network. 

• The learning rate and neighborhood size parameters typically 
decrease linearly with the training iterations, although non-linear 
functions may be used. 

8.6.6 Code Listing 

Listing 8.5 provides an example of the Self-Organizing Map algorithm 
implemented in the Ruby Programming Language. The problem is a 
feature detection problem, where the network is expected to learn a 
predefined shape based on being exposed to samples in the domain. The 
domain is two-dimensional x,y £ [0, 1], where a shape is pre-defined 
as a square in the middle of the domain x,y € [0.3,0.6]. The system 
is initialized to vectors within the domain although is only exposed to 
samples within the pre-defined shape during training. The expectation 
is that the system will model the shape based on the observed samples. 

The algorithm is an implementation of the basic Self-Organizing Map 
algorithm based on the description in Chapter 3 of the seminal book on 
the technique [5] . The implementation is configured with a 4 x 5 grid of 
nodes, the Euclidean distance measure is used to determine the BMU 
and neighbors, a Bubble neighborhood function is used. Error rates 
are presented to the console, and the codebook vectors themselves are 
described before and after training. The learning process is incremental 
rather than batch, for simplicity. 

An extension to this implementation would be to visualize the result- 
ing network structure in the domain - shrinking from a mesh that covers 
the whole domain, down to a mesh that only covers the pre-defined 
shape within the domain. 



1 

2 
3 
4 

6 
7 
8 
9 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 
38 
39 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
50 
51 
52 
53 
54 
55 



352 



Chapter 8. Neural Algorithms 



def random_vector (minmax) 

return Array . new (minmax . size) do |i| 

minmax [i] [0] + ( (minmax [i] [1] - minmax [i] [0] ) * randQ) 

end 
end 

def initialize_vectors(domain, width, height) 
codebook_vectors = [] 
width. times do |x| 
height. times do |y| 
codebook = {} 

codebook[: vector] = random_vector (domain) 
codebook [: coord] = [x,y] 
codebook_vectors « codebook 
end 
end 

return codebook_vectors 
end 

def euclidean_distance (cl , c2) 
sum = 0.0 

cl.each_index {|i| sum += (cl [i] -c2 [i] ) **2 . 0} 
return Math . sqrt ( sum) 
end 

def get_best_matching_unit (codebook_vectors , pattern) 

best, b_dist = nil, nil 

codebook_vectors . each do I codebook I 

dist = euclidean_distance(codebook[:vector] , pattern) 
best,b_dist = codebook, dist if b_dist.nil? or dist<b_dist 

end 

return [best, b_dist] 
end 

def get_vectors_in_neighborhood(bmu, codebook_vectors , neigh_size) 
neighborhood = [] 
codebook_vectors . each do I other I 

if euclidean_distance (bmu [: coord] , other [: coord] ) <= neigh_size 

neighborhood << other 
end 
end 

return neighborhood 
end 

def update_codebook_vector (codebook, pattern, Irate) 
codebook [: vector] . each_with_index do |v,i| 
error = pattern [i] -codebook [: vector] [i] 
codebook [: vector] [i] += Irate * error 
end 
end 

def train_network(vectors, shape, iterations, l_rate, neighborhood_size) 
iterations. times do I iter I 
pattern = random_vector (shape) 

Irate = l_rate * (1 . 0-(iter.to_f /iterations . to_f) ) 
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neigh_size = neighborhood_size * (1 . 0-(iter .to_f /iterations . to_f) ) 
bmu.dist = get_best_matching_unit (vectors , pattern) 
neighbors = get_vectors_in_neighborhood(bmu, vectors, neigh_size) 
neighbors . each do I node I 

update_codebook_vector (node , pattern, Irate) 
end 

puts ">training: neighbors=#{neighbors . size} , bmu_dist=#{dist}" 
end 
end 

def summarize_vectors (vectors) 

minmax = Array .new(vectors . first [: vector] . size) { [1 , 0] } 
vectors. each do |c| 

c [: vector] . each_with_index do |v,i| 
minmax [i] [0] = v if v<minmax [i] [0] 
minmax [i] [1] = v if v>minmax [i] [1] 
end 
end 
s = "" 

minmax . each_with_index { I bounds, i I s << "#{i}=#{bounds . inspect} "} 
puts "Vector details: #{s}" 
return minmax 
end 

def test_network(codebook_vectors, shape, num_trials=100) 
error = 0.0 
num_trials . times do 

pattern = random_vector (shape) 

bmu.dist = get_best_matching_unit (codebook_vectors , pattern) 
error += dist 
end 

error /= num_trials.to_f 
puts "Finished, average error=#{error}" 
return error 
end 

def execute(domain, shape, iterations, l_rate, neigh_size, width, 
height) 

vectors = initialize_vectors (domain, width, height) 
summarize_vectors (vectors) 

train_network (vectors, shape, iterations, l_rate, neigh_size) 
test_network(vectors , shape) 
summarize_vectors (vectors) 
return vectors 
end 

if __FILE__ == $0 

# problem configuration 
domain = [ [0 . 0 , 1 . 0] , [0 . 0 , 1 .0] ] 
shape = [[0.3,0.6] , [0.3,0.6]] 

# algorithm configuration 
iterations = 100 

l_rate =0.3 
neigh_size = 5 
width, height = 4, 5 

# execute the algorithm 
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111 execute (domain, shape, iterations, l_rate, neigh_size, width, height) 

112 end 

Listing 8.5: Self-Organizing Map in Ruby 



8.6.7 References 
Primary Sources 

The Self-Organizing Map was proposed by Kohonen in 1982 in a study 
that included the mathematical basis for the approach, summary of 
related physiology, and simulation on demonstration problem domains 
using one and two dimensional topological structures [3] . This work was 
tightly related two other papers published at close to the same time on 
topological maps and self-organization [1, 2]. 

Learn More 

Kohonen provides a detailed introduction and summary of the Self- 
Organizing Map in a journal article [4]. Kohonen et al. provide a 
practical presentation of the algorithm and heuristics for configuration 
in the technical report written to accompany the released SOM-PAK 
implementation of the algorithm for academic research [6]. The seminal 
book on the technique is "Self-Organizing Maps" by Kohonen, which 
includes chapters dedicated to the description of the basic approach, 
physiological interpretations of the algorithm, variations, and summaries 
of application areas [5]. 
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Chapter 9 

Advanced Topics 



This chapter discusses a number of advanced topics that may be con- 
sidered once one or more of the algorithms described in this book have 
been mastered. 

The topics in this section consider some practical concerns such as: 

• How to implement an algorithm using a different programming 
paradigm (Section 9.1). 

• How to devise and investigate a new biologically-inspired algorithm 
(Section 9.2). 

• How to test algorithm implementations to ensure they are imple- 
mented correctly (Section 9.3). 

• How to visualize problems, algorithm behavior and candidate 
solutions (Section 9.4). 

• How to direct these algorithms toward practical problem solving 
(Section 9.5). 

• Issues to consider when benchmarking and comparing the capabil- 
ities of algorithms (Section 9.6). 

The objective of this chapter is to illustrate the concerns and skills 
necessary for taking the algorithms described in this book into the 
real- world. 
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9.1 Programming Paradigms 

This section discusses three standard programming paradigms that may 
be used to implement the algorithms described throughput the book: 

• Procedural Programming (Section 9.1.1) 

• Object-Oriented Programming (Section 9.1.2) 

• Flow Programming (Section 9.1.3) 

Each paradigm is described and an example implementation is pro- 
vided using the Genetic Algorithm (described in Section 3.2) as a 
context. 

9.1.1 Procedural Programming 

This section considers the implementation of algorithms from the Clever 
Algorithms project in the Procedural Programming Paradigm. 

Description 

The procedural programming paradigm (also called imperative program- 
ming) is concerned with defining a linear procedure or sequence of pro- 
gramming statements. A key feature of the paradigm is the partitioning 
of functionality into small discrete re-usable modules called procedures 
(subroutines or functions) that act like small programs themselves with 
their own scope, inputs and outputs. A procedural code example is 
executed from a single point of control or entry point which calls out 
into declared procedures, which in turn may call other procedures. 

Procedural programming was an early so-called 'high-level program- 
ming paradigm' (compared to lower-level machine code) and is the most 
common and well understood form of programming. Newer paradigms 
(such as Object-Oriented programming) and modern businesses pro- 
gramming languages (such as C-l — h, Java and C#) are built on the 
principles of procedural programming. 

All algorithms in this book were implemented using a procedural 
programming paradigm in the Ruby Programming Language. A pro- 
cedural representation was chosen to provide the most transferrable 
instantiation of the algorithm implementations. Many languages support 
the procedural paradigm and procedural code examples are expected 
to be easily ported to popular paradigms such as object-oriented and 
functional. 
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Example 

Listing 3.1 in Section 3.2 provides an example of the Genetic Algorithm 
implemented in the Ruby Programming Language using the procedural 
programming paradigm. 

9.1.2 Object-Oriented Programming 

This section considers the implementation of algorithms from the Clever 
Algorithms project in the Object-Oriented Programming Paradigm. 

Description 

The Object-Oriented Programming (OOP) paradigm is concerned with 
modeling problems in terms of entities called objects that have attributes 
and behaviors (data and methods) and interact with other entities 
using message passing (calling methods on other entities). An object 
developer defines a class or template for the entity, which is instantiated 
or constructed and then may be used in the program. 

Objects can extend other objects, inheriting some or all of the at- 
tributes and behaviors from the parent providing specific modular reuse. 
Objects can be treated as a parent type (an object in its inheritance 
tree) allowing the use or application of the objects in the program with- 
out the caller knowing the specifics of the behavior or data inside the 
object. This general property is called polymorphism, which exploits 
the encapsulation of attributes and behavior within objects and their 
capability of being treated (viewed or interacted with) as a parent type. 

Organizing functionality into objects allows for additional constructs 
such as abstract types where functionality is only partially defined and 
must be completed by descendant objects, overriding where descending 
objects re-define behavior defined in a parent object, and static classes 
and behaviors where behavior is executed on the object template rather 
than the object instance. For more information on Object-Oriented 
programming and software design refer to a good textbook on the 
subject, such as Booch [1] or Meyer [3]. 

There are common ways of solving discrete problems using object- 
oriented programs called patterns. They are organizations of behavior 
and data that have been abstracted and presented as a solution or 
idiom for a class of problem. The Strategy Pattern is an object-oriented 
pattern that is suited to implementing an algorithm. This pattern is 
intended to encapsulate the behavior of an algorithm as a strategy 
object where different strategies can be used interchangeably on a given 
context or problem domain. This strategy can be useful in situations 
where the performance or capability of a range of different techniques 
needs to be assessed on a given problem (such as algorithm racing or 
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bake-offs). Additionally, the problem or context can also be modeled 
as an interchangeable object, allowing both algorithms and problems 
to be used interchangeably. This method is used in object-oriented 
algorithm frameworks. For more information on the strategy pattern or 
object-oriented design patterns in general, refer to Gamma et al. [2]. 

Example 

Listing 9.1 provides an example of the Genetic Algorithm implemented 
in the Ruby Programming Language using the Object-Oriented Pro- 
gramming Paradigm. 

The implementation provides general problem and strategy classes 
that define their behavioral expectations. A DneMax problem class and 
a GeneticAlgorithm strategy class are specified. The algorithm makes 
few assumptions of the problem other than it can assess candidate 
solutions and determine whether a given solution is optimal. The 
problem makes very few assumptions about candidate solutions other 
than they are map data structures that contain a binary string and 
fitness key-value pairs. The use of the Strategy Pattern allows a new 
algorithm to easily be defined to work with the existing problem, and 
that new problems could be defined for the Genetic Algorithm to execute. 

Note that Ruby does not support abstract classes, so this construct 
is simulated by defining methods that raise an exception if they are not 
overridden by descendant classes. 

# A problem template 
class Problem 

def assess (candidate_solution) 

raise "A problem has not been defined" 
end 

def is_optimal? (candidate_solution) 

raise "A problem has not been defined" 
end 
end 

# An strategy template 
class Strategy 

def execute (problem) 

raise "A strategy has not been defined!" 
end 
end 

# An implementation of the OneMax problem using the problem template 
class OneMax < Problem 

attr_reader :num_bits 

def initialize(num_bits=64) 

@num_bits = num_bits 
end 
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def assess(candidate_solution) 

if candidate_solution[:bitstring] .length != @num_bits 
raise "Expected #{<Snum_bits} in candidate solution." 
end 

sum = 0 

candidate_solution[:bitstring] . size. times do |i| 

sum += 1 if candidate_solution[:bitstring] [i].chr =='1' 
end 

return sum 
end 

def is_optimal? (candidate_solution) 

return candidate_solution[:f itness] == Snum_bits 
end 
end 

# An implementation of the Genetic algorithm using the strategy template 
class GeneticAlgorithm < Strategy 

attr_reader :max_generations , :population_size, :p_crossover , 
:p_mutation 

def initialize (max_gens=100, pop_size=100 , crossover=0.98, 
mutation=l .0/64.0) 

9max_generations = max_gens 

<3population_size = pop_size 

8p_crossover = crossover 

9p_mutation = mutation 
end 

def random_bitstring(num_bits) 

return (0. . .num_bits) . inject (""){ I s, i I s<< ( (rand<0 . 5) ? "1" : "0")} 
end 

def binary_tournament (pop) 

i, j = rand (pop . size) , rand(pop . size) 
j = rand (pop . size) while j==i 

return (pop [i] [: fitness] > pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def point_mutation(bitstring) 
child = "" 

bitstring . size . times do |i| 
bit = bitstring [i] . chr 

child « ((rand()<@p_mutation) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def unif orm_crossover (parentl , parent2) 
return ""+parentl if randO >=Sp_crossover 
child = "" 

parentl . length. times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 
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return child 
end 

def reproduce (selected) 
children = [] 

selected. each_with_index do I pi, i| 
p2 = (i. modulo (2) ==0) ? selected [i+1] : selected[l-l] 
p2 = selected[0] if i == selected. size-1 
child = {} 

child [:bitstring] = unif orm_crossover(pl [:bitstring] , 

p2 [:bitstring] ) 
child [:bitstring] = point_mutation(child[:bitstring] ) 
children << child 

break if children. size >= 9population_size 
end 

return children 
end 

def execute (problem) 

population = Array .new Opopulation_size) do |i| 

{ : bitstring=>random_bitstring (problem . num_bits) } 
end 

population . each{ I c I c[:fitness] = problem. assess(c)} 

best = population. sort{ I x ,y I y[:fitness] <=> x [: fitness] }. first 

<3max_generations .times do I gen I 

selected = Array . new(population_size) { I i I 
binary_tournament (population) } 

children = reproduce (selected) 

children. each{ I c I c[:fitness] = problem. assess(c)} 
children. sort !{|x,y I y[:fitness] <=> x[:fitness]} 

best = children . first if children. f irst [:f itness] >= best [: fitness] 
population = children 

puts " > gen #{gen}, best: #{best [: fitness] } , #{best [ :bitstring] }" 
break if problem . is_optimal? (best) 
end 

return best 
end 
end 

if __FILE__ == $0 

# problem configuration 
problem = OneMax.new 

# algorithm configuration 
strategy = GeneticAlgorithm.new 

# execute the algorithm 

best = strategy. execute (problem) 

puts "done! Solution: f =#{best [: fitness] } , s=#{best [ :bitstring] }" 
end 



Listing 9.1: Genetic Algorithm in Ruby using OOP 



9.1.3 Flow Programming 



This section considers the implementation of algorithms from the Clever 
Algorithms project in the Flow Programming paradigm. 
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Description 

Flow, data-flow, or pipeline programming involves chaining a sequence 
of smaller processes together and allowing a flow of information through 
the sequence in order to perform the desired computation. Units in 
the flow are considered black-boxes that communicate with each other 
using message passing. The information that is passed between the 
units is considered a stream and a given application may have one or 
more streams of potentially varying direction. Discrete information in a 
stream is partitioned into information packets which are passed from 
unit-to-unit via message buffers, queues or similar data structures. 

A flow organization allows computing units to be interchanged readily. 
It also allows for variations of the pipeline to be considered with minor 
reconfiguration. A flow or pipelining structure is commonly used by 
software frameworks for the organization within a given algorithm 
implementation, allowing the specification of operators that manipulate 
candidate solutions to be varied and interchanged. 

For more information on Flow Programming see a good textbook on 
the subject, such as Morrison [4]. 



Example 

Listing 9.2 provides an example of the Genetic Algorithm implemented 
in the Ruby Programming Language using the Flow Programming 
paradigm. Each unit is implemented as an object that executes its logic 
within a standalone thread that reads input from the input queue and 
writes data to its output queue. The implementation shows four flow 
units organized into a cyclic graph where the output queue of one unit 
is used as the input of the next unit in the cycle (EvalFlowUnit to 
StopConditionUnit to SelectFlowUnit to VariationFlowUnit). 

Candidate solutions are the unit of data that is passed around in 
the flow between units. When the system is started it does not have 
any information to process until a set of random solutions are injected 
into the evaluation unit's input queue. The solution are evaluated and 
sent to the stop condition unit where the constraints of the algorithm 
execution are tested (optima found or maximum number of evaluations) 
and the candidates are passed on to the selection flow unit. The selection 
unit collects a predefined number of candidate solutions then passes the 
better solutions onto the variation unit. The variation unit performs 
crossover and mutation on each pair of candidate solutions and sends 
the results to the evaluation unit, completing the cycle. 

require 1 thread 1 

# Generic flow unit 
class FlowUnit 

attr_reader :queue_in, :queue_out, : thread 
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def initialize(q_in=Queue.new, q_out=Queue.new) 

0queue_in, @queue_out = q_in, q_out 

start () 
end 

def execute 

raise "FlowUnit not defined!" 
end 

def start 

puts "Starting flow unit: #{self . class .name} ! " 
Othread = Thread. new do 

execute () while true 
end 
end 
end 

# Evaluation of solutions flow unit 
class EvalFlowUnit < FlowUnit 

def onemax(bitstring) 
sum = 0 

bitstring. size .times {|i| sum+=l if bitstring [i] . chr== ' 1 ' } 
return sum 
end 

def execute 

data = @queue_in.pop 

data[: fitness] = onemax (data [: bitstring] ) 
Oqueue_out .push(data) 
end 
end 

# Stop condition flow unit 

class StopConditionUnit < FlowUnit 

attr_reader :best, :num_bits, :max_evaluations, :evals 

def initialize(q_in=Queue.new, q_out=Queue.new, 
max_evaluations=10000 , num_bits=64) 

Obest, Oevals = nil, 0 

Onum_bits = num_bits 

Omax_evaluations = max_evaluations 

super (q_ in, q_out) 
end 

def execute 

data = @queue_in.pop 

if Obest. nil? or data[ : fitness] > Obest [: fitness] 
Obest = data 

puts " >new best: #{0best [: fitness] } , #{0best [: bitstring] }" 
end 

Oevals += 1 

if Obest [: fitness] ==0num_bits or Oevals>=0max_evaluations 
puts "done! Solution: f =#{0best [: fitness] } , s=#{0best [ :bitstring] }" 
Othread. exit () 

end 
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9queue_out . push (data) 
end 
end 

# Fitness -based selection flow unit 
class SelectFlowUnit < FlowUnit 

def initialize (q_in=Queue. new, q_out=Queue .new, pop_size=100) 

<3pop_size = pop_size 

super (q_in, q_out) 
end 

def binary_tournament (pop) 

i, j = rand (pop . size) , rand(pop . size) 
j = rand (pop . size) while j==i 

return (pop [i] [: fitness] > pop [j] [: fitness] ) ? pop[i] : pop[j] 
end 

def execute 

population = Array. new 

population << 9queue_in.pop while population. size < <Spop_size 
9pop_size . times do |i| 

@queue_out .push (binary_tournament (population) ) 
end 
end 
end 

# Variation flow unit 

class VariationFlowUnit < FlowUnit 

def initialize(q_in=Queue.new, q_out=Queue .new, crossover=0 . 98, 
mutation=l .0/64.0) 
9p_crossover = crossover 
9p_mutation = mutation 
super (q_in, q_out) 
end 

def unif orm_crossover (parentl , parent2) 
return ""+parentl if rand() >=Sp_crossover 
child = "" 

parent 1 . length. times do |i| 

child « ((rand()<0.5) ? parentl [i] . chr : parent2 [i] . chr) 
end 

return child 
end 

def point_mutation(bitstring) 
child = "" 

bitstring. size. times do |i| 
bit = bitstring [i] .chr 

child « ((rand()«ap_mutation) ? ((bit=='l') ? "0" : "1") : bit) 
end 

return child 
end 

def reproduce (pi , p2) 
child = {} 

child [: bitstring] = unif orm_crossover(pl [:bitstring] , 



368 



Chapter 9. Advanced Topics 



p2 [ :bitstring] ) 
child [:bitstring] = point_mutation(child[:bitstring] ) 
return child 
end 

def execute 

parentl = @queue_in.pop 
parent2 = @queue_in.pop 

@queue_out .push(reproduce(parentl , parent2) ) 
@queue_out .push (reproduce (parent 2 , parentl) ) 
end 
end 

def random_bitstring(num_bits) 

return (0 . . .num_bits) . inject (" " ) { I s , i I s« ( (rand<0 . 5) ? "1" : "0")} 
end 

def search (population_size=100 , num_bits=64) 

# create the pipeline 
eval = EvalFlowUnit .new 

stopcondition = StopConditionUnit .new(eval .queue_out) 

selection = SelectFlowUnit .new(stopcondition . queue_out) 

variation = VariationFlowUnit .new(selection. queue_out , eval . queue_in) 

# push random solutions into the pipeline 
population_size . times do 

solution = { :bitstring=>random_bitstring(num_bits) } 
eval . queue_in .push (solution) 
end 

stopcondition. thread. join 
return stopcondition. best 
end 

if __FILE__ == $0 

best = searchO 

puts "done! Solution: f =#{best [: fitness] } , s=#{best [ :bitstring] }" 
end 



Listing 9.2: Genetic Algorithm in Ruby using the Flow Programming 



9.1.4 Other Paradigms 

A number of popular and common programming paradigms have been 
considered in this section, although many more have not been described. 

Many programming paradigms are not appropriate for implementing 
algorithms as-is, but may be useful with the algorithm as a component 
in a broader system, such as Agent-Oriented Programming where the al- 
gorithm may be a procedure available to the agent. Meta-programming 
a case where the capabilities of the paradigm may be used for parts 
of an algorithm implementation, such as the manipulation of candi- 
date programs in Genetic Programming (Section 3.3). Aspect-Oriented 
Programming could be layered over an object oriented algorithm imple- 
mentation and used to separate the concerns of termination conditions 
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and best solution logging. 

Other programming paradigms provide variations on what has al- 
ready been described, such as Functional Programming which would be 
similar to the procedural example, and Event-Driven Programming that 
would not be too dissimilar in principle to the Flow-Based Program- 
ming. Another example is the popular idiom of Map-Reduce which is 
an application of functional programming principles organized into a 
data flow model. 

Finally, there are programming paradigms that are not relevant or 
feasible to consider implementing algorithms, such as Logic Program- 
ming. 
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9.2 Devising New Algorithms 

This section provides a discussion of some of the approaches that may 
be used to devise new algorithms and systems inspired by biological 
systems for addressing mathematical and engineering problems. This 
discussion covers: 

• An introduction to adaptive systems and complex adaptive systems 
as an approach for studying natural phenomenon and deducing 
adaptive strategies that may be the basis for algorithms (Sec- 
tion 9.2.1). 

• An introduction to some frameworks and methodologies for reduc- 
ing natural systems into abstract information processing proce- 
dures and ultimately algorithms (Section 9.2.2). 

• A summary of a methodology that may be used to investigate 
a devised adaptive system that considers the trade-off in model 
fidelity and descriptive power proposed by Goldberg, a pioneer in 
the Evolutionary Computation field (Section 9.2.3). 

9.2.1 Adaptive Systems 

Many algorithms, such as the Genetic Algorithm have come from the 
study and models of complex and adaptive systems. Adaptive systems 
research provides a methodology by which these systems can be system- 
atically investigated resulting in adaptive plans or strategies that can 
provide the basis for new and interesting algorithms. 

Holland proposed a formalism in his seminal work on adaptive 
systems that provides a general manner in which to define an adaptive 
system [7] . Phrasing systems in this way provides a framework under 
which adaptive systems may be evaluated and compared relative to each 
other, the difficulties and obstacles of investigating specific adaptive 
systems are exposed, and the abstracted principles of different system 
types may be distilled. This section provides a summary of the Holland's 
seminal adaptive systems formalism and considers clonal selection as an 
example of an adaptive plan. 

Adaptive Systems Formalism 

This section presents a brief review of Holland's adaptive systems formal- 
ism described in [7] (Chapter 2). This presentation focuses particularly 
on the terms and their description, and has been hybridized with the 
concise presentation of the formalism by De Jong [9] (page 6). The 
formalism is divided into sections: 1) Primary Objects summarized in 
Table 9.1, and 2) Secondary Objects summarized in Table 9.2. Primary 
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Objects are the conventional objects of an adaptive system: the envi- 
ronment e, the strategy or adaptive plan that creates solutions in the 
environment s, and the utility assigned to created solutions U. 



Term 


Object 


Description 


e 


Environment 


The environment of the system undergoing adap- 
tation. 


s 


Strategy 


The adaptive plan which determines successive 
structural modifications in response to the envi- 
ronment. 


U 


Utility 


A measure of performance or payoff of different 
structures in the environment. Maps a given 
solution (A) to a real number evaluation. 



Table 9.1: Primary Objects in the adaptive systems formalism. 



Secondary Objects extend beyond the primary objects providing the 
detail of the formalism. These objects suggest a broader context than 
that of the instance specific primary objects, permitting the evaluation 
and comparison of sets of objects such as plans (S), environments (E), 
search spaces (A), and operators (O). 

A given adaptive plan acts in discrete time t, which is a useful 
simplification for analysis and computer simulation. A framework for a 
given adaptive system requires the definition of a set of strategies S, a 
set of environments E, and criterion for ranking strategies X. A given 
adaptive plan is specified within this framework given the following 
set of objects: a search space A, a set of operators O, and feedback 
from the environment /. Holland proposed a series of fundamental 
questions when considering the definition for an adaptive system, which 
he rephrases within the context of the formalism (see Table 9.3). 

Some Examples 

Holland provides a series of illustrations rephrasing common adaptive 
systems in the context of the formalism [7] (pages 35-36). Examples 
include: genetics, economics, game playing, pattern recognition, control, 
function optimization, and the central nervous system. The formalism 
is applied to investigate his schemata theorem, reproductive plans, 
and genetic plans. These foundational models became the field of 
Evolutionary Computation (Chapter 3). 

From working within the formalism, Holland makes six observa- 
tions regarding obstacles that may be encountered whilst investigating 
adaptive systems [7] (pages 159-160): 

• High cardinality of A: makes searches long and storage of relevant 
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Term 


Object 


Description 


A 


Search Space 


The set of attainable structures, solutions, and 
the domain of action for an adaptive plan. 


E 


Environments 


The range of different environments, where e is 
an instance. It may also represent the unknowns 
of the strategy about the environment. 


O 


Operators 


Set of operators applied to an instance of A at 
time t (At) to transform it into At+i. 


s 


Strategies 


Set of plans applicable for a given environment 
(where s is an instance), that use operators from 
the set O. 


X 


Criterion 


Used to compare strategies (in the set S), under 
the set of environments (E). Takes into account 
the efficiency of a plan in different environments. 


I 


Feedback 


Set of possible environmental inputs and signals 
providing dynamic information to the system 
about the performance of a particular solution 
A in a particular environment E. 


M 


Memory 


The memory or retained parts of the input his- 
tory (J) for a solution (A). 



Table 9.2: Secondary Objects in the adaptive systems formalism. 



data difficult. 

• Appropriateness of credit: knowledge of the properties about 
'successful' structures is incomplete, making it hard to predict 
good future structures from past structures. 

• High dimensionality of U on an e: performance is a function of a 
large number of variables which is difficult for classical optimization 
methods. 

• Non-linearity of U on an e: many false optima or false peaks, 
resulting in the potential for a lot of wasted computation. 

• Mutual interference of search and exploitation: the exploration (ac- 
quisition of new information), exploitation (application of known 
information) trade-off. 

• Relevant non-payoff information: the environment may provide a 
lot more information in addition to payoff, some of which may be 
relevant to improved performance. 
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Question 


Formal 




To what parts of its environment is the organism (system, 
organization) adapting? 


What 


is 


El 


How does the environment act upon the adapting organism 
(system, organization)? 


What 


is 


11 


What structures are undergoing adaptation? 


What 


is 


A? 


What are the mechanisms of adaptation? 


What 


is 


Ol 


What part of the history of its interaction with the envi- 
ronment does the organism (system, organization) retain 
in addition to that summarized in the structure tested? 


What 


is 


M? 


What limits are there to the adaptive process? 


What 


is 


SI 


How are different (hypotheses about) adaptive processes to 
be compared? 


What 


is 


XI 



Table 9.3: Questions when investigating adaptive systems, taken from 
[7] (pg. 29). 



Cavicchio provides perhaps one of the first applications of the for- 
malism (after Holland) in his dissertation investigating Holland's repro- 
ductive plans [10] (and to a lesser extent in [11]). The work summarizes 
the formalism, presenting essentially the same framework, although 
he provides a specialization of the search space A. The search space 
is broken down into a representation (codes), solutions (devices), and 
a mapping function from representation to solutions. The variation 
highlights the restriction the representation and mapping have on the 
designs available to the adaptive plan. Further, such mappings may not 
be one-to-one, there may be many instances in the representation space 
that map to the same solution (or the reverse). 

Although not explicitly defined, Holland's specification of structures 
A is clear in pointing out that the structures are not bound to a level of 
abstraction; the definition covers structures at all levels. Nevertheless, 
Cavicchio's specialization for a representation-solution mapping was 
demonstrated to be useful in his exploration of reproductive plans (early 
Genetic Algorithms) . He proposed that an adaptive system is first order 
if the utility function U for structures on an environment encompasses 
feedback /. 

Cavicchio described the potential independence (component-wise) 
and linearity of the utility function with respect to the representation 
used. De Jong also employed the formalism to investigate reproductive 
plans in his dissertation research [9]. He indicated that the formalism 
covers the essential characteristics of adaptation, where the performance 
of a solution is a function of its characteristics and its environment. 
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Adaptation is defined as a strategy for generating better-performing 
solutions to a problem by reducing initial uncertainty about the environ- 
ment via feedback from the evaluation of individual solutions. De Jong 
used the formalism to define a series of genetic reproductive plans, which 
he investigated in the context of function optimization. 



Complex Adaptive Systems 

Adaptive strategies are typically complex because they result in irre- 
ducible emergent behaviors that occur as a result of the non-linear 
interactions of system components. The study of Complex Adaptive 
Systems (CAS) is the study of high-level abstractions of natural and 
artificial systems that are generally impervious to traditional analy- 
sis techniques. Macroscopic patterns emerge from the dynamic and 
non-linear interactions of the system's low-level (microscopic) adaptive 
agents. The emergent patterns are more than the sum of their parts. 
As such, traditional reductionist methodologies fail to describe how 
the macroscopic patterns emerge. Holistic and totalistic investigatory 
approaches are applied that relate the simple rules and interactions of 
the simple adaptive agents to their emergent effects in a 'bottom-up' 
manner. 

Some relevant examples of CAS include: the development of embryos, 
ecologies, genetic evolution, thinking and learning in the brain, weather 
systems, social systems, insect swarms, bacteria becoming resistant to 
an antibiotic, and the function of the adaptive immune system. 

The field of CAS was founded at the Santa Fe Institute (SFI), in the 
late 1980s by a group of physicists, economists, and others interested 
in the study of complex systems in which the agents of those systems 
change [1]. One of the most significant contributors to the inception 
of the field from the perspective of adaptation was Holland. He was 
interested in the question of how computers could be programmed so 
that problem-solving capabilities are built up by specifying: "what is 
to be done" (inductive information processing) rather than "how to 
do if" (deductive information processing). In the 1992 reprint of his 
book he provided a summary of CAS with a computational example 
called ECHO [7] . His work on CAS was expanded in a later book which 
provided an in depth study of the topic [8]. 

There is no clear definition of a Complex Adaptive System, rather 
sets of parsimonious principles and properties, many different researches 
in the field defining their own nomenclature. Popular definitions beyond 
Holland's work include that of Cell-Mann [4] and Arthur [2]. 
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9.2.2 Biologically Inspired Algorithms 

Explicit methodologies have been devised and used for investigating nat- 
ural systems with the intent of devising new computational intelligence 
techniques. This section introduces two such methodologies taken from 
the field of Artificial Immune Systems (Chapter 7). 

Conceptual Framework 

Although the progression from an inspiring biological system to an 
inspired computation system may appear to be an intuitive process, 
it can involve problems of standardization of nomenclature, effective 
abstraction and departure from biology, and rigor. Stepney, et al. 
caution that by following a process that lacks the detail of modeling, 
one may fall into the trap of reasoning by metaphor [12-14]. 

Besides the lack of rigor, the trap suggests that such reasoning and 
lack of objective analysis limits and biases the suitability and applica- 
bility of resultant algorithms. They propose that many algorithms in 
the field of Artificial Immune Systems (and beyond) have succumbed to 
this trap. This observation resulted in the development and application 
of a conceptual framework to provide a general process that may be 
applied in the field of Biological Inspired Computation toward realizing 
Biological Inspired Computational Intelligence systems. 

The conceptual framework is comprised of the following actors and 
steps: 

1. Biological System: The driving motivation for the work that 
possesses some innate information processing qualities. 

2. Probes: Observations and experiments that provide a partial or 
noisy perspective of the biological system. 

3. Models: From probes, abstract and simplified models of the infor- 
mation processing qualities of the system are built and validated. 

4. Framework: Built and validated analytical computational frame- 
works. Validation may use mathematical analysis, benchmark 
problems, and engineering demonstration. 

5. Algorithms: The framework provides the principles for designing 
and analyzing algorithms that may be general and applicable to 
domains unrelated to the biological motivation. 

Immunology as Information Processing 

Forrest and Hofmeyr summarized their AIS research efforts at the Uni- 
versity of New Mexico and the Santa Fe Institute as ^immunology as 
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information processing" [3] . They define information as spatio-temporal 
patterns that can be abstracted and described independent of the bio- 
logical system and information processing as computation with these 
patterns. They proposed that such patterns are encoded in the proteins 
and other molecules of the immune system, and that they govern the 
behavior of the biological system. They suggest that their information 
processing perspective can be contrasted with the conventional struc- 
tural perspective of cellular interactions as mechanical devices. They 
consider a simple four-step procedure for the investigation of immunol- 
ogy as information processing, transitioning from the biological system 
to a usable computational tool: 

1. Identify a specific mechanism that appears to be interesting com- 
putationally. 

2. Write a computer program that implements or models the mecha- 
nism. 

3. Study its properties through simulation and mathematical analysis. 

4. Demonstrate capabilities either by applying the model to a biologi- 
cal question of interest or by showing how it can be used profitably 
in a computer science setting. 

The procedure is similar to the outlined in the conceptual framework 
for Biologically Inspired Algorithms in that in addition to identifying 
biological mechanisms (input) and demonstrating a resultant algorithms 
(output), the procedure 1) highlights the need for abstraction involving 
modeling the identified mechanism, and 2) highlights the need to analyze 
the models and abstractions. The procedure of Forrest and Hofmeyr 
can be used to specialize the conceptual framework of Stepney et al. by 
clearly specifying the immunological information processing focus. 

9.2.3 Modeling a New Strategy 

Once an abstract information processing system is devised it must be 
investigated in a systematic manner. There are a range of modeling 
techniques for such a system from weak and rapid to realize to strong 
and slow to realize. This section considers the trade-off's in modeling 
an adaptive technique. 

Engineers and Mathematicians 

Goldberg describes the airplane and other products of engineering as 
material machines, and distinguishes them from the engineering of 
genetic algorithms and other adaptive systems as conceptual machines. 
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He argues the methodological distinction between the two is counter- 
productive and harmful from the perspective of conceptual machines, 
specifically that the methodology of the material is equally applicable 
to that of the conceptual [5]. 

The obsession of mathematical rigor in computer science, although 
extremely valuable, is not effective in the investigation of adaptive 
systems given their complexity. Goldberg sites the airplane as an 
example where the engineering invention is used and trusted without a 
formal proof that the invention works (that an airplane can fly). 1 

This defense leads to what Goldberg refers to the economy of de- 
sign, which is demonstrated with a trade-off that distinguishes 'model 
description' (mathematician-scientists) that is concerned with model 
fidelity, and model prescription (engineer-inventor) that is concerned 
with a working product. In descriptive modeling the model is the thing 
whereas in 'prescriptive modeling', the object is the thing. In the latter, 
the model (and thus its utility) serves the object, in the former model 
accuracy may be of primary concern. This economy of modeling pro- 
vides a perspective that distinguishes the needs of the prescriptive and 
descriptive fields of investigation. 

The mathematician-scientist is interested in increasing model accu- 
racy at the expense of speed (slow), whereas the engineer may require 
a marginally predictive (less accurate) model relatively quickly. This 
trade-off between high-cost high-accuracy models and low-cost low- 
fidelity models is what may be referred to as the modeling spectrum that 
assists in selecting an appropriate level of modeling. Goldberg proposes 
that the field of Genetic Algorithms expend too much effort at either 
ends of this spectrum. There is much work where there is an obsession 
with blind-prototyping many different tweaks in the hope of striking it 
lucky with the right mechanism, operator, or parameter. Alternatively, 
there is also an obsession with detailed mathematical models such as 
differential equations and Markov chains. The middle ground of the 
spectrum, what Goldberg refers to as little models is a valuable economic 
modeling consideration for the investigation of conceptual machines to 
11 do good science through good engineering" . 

Methodology 

The methodology has been referred to as post-modern systems engineer- 
ing and is referred to by Goldberg as a methodology of innovation [6] . 
The core principles of the process are as follows: 

1. Decomposition: Decompose the large problem approximately and 
intuitively, breaking into quasi-separate sub-problems (as separate 



1 Goldberg is quick to point out that sets of equations do exist for various aspects 
of flight, although no integrated mathematical proof for airplane flight exists. 
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as possible). 

2. Modeling: Investigate each sub- problem separately (or as sepa- 
rate as possible) using empirical testing coupled with adequately 
predictive, low-cost models. 

3. Integration: Assemble the sub-solutions and test the overall in- 
vention, paying attention to unforeseen interactions between the 
sub-problems. 

Decomposition Problem decomposition and decomposition design is 
an axiom of reductionism and is at the very heart of problem solving 
in computer science. In the context of adaptive systems, one may 
consider the base or medium on which the system is performing its 
computation mechanisms the so-called building blocks of information 
processing. A structural decomposition may involve the architecture 
and data structures of the system. Additionally, one may also consider 
a functional breakdown of mechanisms such as the operators applied at 
each discrete step of an algorithmic process. The reductions achieved 
provide the basis of investigation and modeling. 

Small Models Given the principle of the economy of modeling pre- 
sented as a spectrum, one may extend the description of each of the 
five presented model types. Small Models refers to the middle of the 
spectrum, specifically to the application of dimensional and facet-wise 
models. These are mid-range quantitative models that make accurate 
prediction over a limited range of states at moderate cost. Once derived, 
this class of models generally require a small amount of formal manipu- 
lation and large amounts of data for calibration and verification. The 
following summarizes the modeling spectrum: 

• Unarticulated Wisdom: (low-cost, high-error) Intuition, what is 
used when there is nothing else. 

• Articulated Qualitative Models: Descriptions of mechanisms, graph- 
ical representations of processes and/or relationships, empirical 
observation or statistical data collection and analysis. 

• Dimensional Models: Investigate dimensionless parameters of the 
system. 

• Facet-wise Models: Investigation of a decomposition element of a 
model in relative isolation. 

• Equations of Motion: (high-cost, low-error) Differential equations 
and Markov chains. 
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Facet-wise models are an exercise in simple mathematics that may 
be used to investigate a decomposition element of a model in relative 
isolation. They are based on the idea of bracketing high-order phe- 
nomena by simplifying or making assumptions about the state of the 
system. An example used by Goldberg from fluid mechanics is a series 
of equations that simplify the model by assuming that a fluid or gas has 
no viscosity, which matches no known substance. A common criticism of 
this modeling approach is "system X doesn't work like that, the model is 
unrealistic.'" The source of such concerns with adaptive systems is that 
their interactions are typically high-dimensional and non-linear. Gold- 
berg's response is that for a given poorly understood area of research, 
any 'useful' model is better than no model. Dimensional analysis or the 
so-called dimensional reasoning and scaling laws are another common 
conceptual tool in engineering and the sciences. Such models may be 
used to investigate dimensionless parameters of the system, which may 
be considered the formalization of the systemic behaviors. 

Integration Integration is a unification process of combining the 
findings of various models together to form a patch- quilt coherent theory 
of the system. Integration is not limited to holistic unification, and 
one may address specific hypothesis regarding the system resulting in 
conclusions about existing systems and design decisions pertaining to 
the next generation of systems. 

Application In addition to elucidating the methodology, Goldberg 
specifies a series of five useful heuristics for the application of the 
methodology (taken from [5], page 8): 

1. Keep the goal of a working conceptual machine in mind. Experi- 
menters commonly get side tracked by experimental design and 
statistical verification; theoreticians get side tracked with notions 
of mathematical rigor and model fidelity. 

2. Decompose the design ruthlessly. One cannot address the analytical 
analysis of a system like a Genetic Algorithm in one big 'gulp'. 

3. Use facet-wise models with almost reckless abandon. One should 
build easy models that can be solved by bracketing everything 
that gets in the way. 

4. Integrate facet-wise models using dimensional arguments. One can 
combine many small models together in a patch-quilt manner and 
defend the results of such models using dimensional analysis. 

5. Build high-order models when small models become inadequate. 
Add complexity to models as complexity is needed (economy of 
modeling) . 
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9.3 Testing Algorithms 

This section provides an introduction to software testing and the testing 
of Artificial Intelligence algorithms. Section 9.3.1 introduces software 
testing and focuses on a type of testing relevant to algorithms called 
unit testing. Section 9.3.2 provides a specific example of an algorithm 
and a prepared suite of unit tests, and Section 9.3.3 provides some 
rules-of-thumb for testing algorithms in general. 

9.3.1 Software Testing 

Software testing in the field of Software Engineering is a process in 
the life-cycle of a software project that verifies that the product or 
service meets quality expectations and validates that software meets 
the requirements specification. Software testing is intended to locate 
defects in a program, although a given testing method cannot guarantee 
to locate all defects. As such, it is common for an application to be 
subjected to a range of testing methodologies throughout the software 
life-cycle, such as unit testing during development, integration testing 
once modules and systems are completed, and user acceptance testing 
to allow the stakeholders to determine if their needs have been met. 

Unit testing is a type of software testing that involves the preparation 
of well-defined procedural tests of discrete functionality of a program 
that provide confidence that a module or function behaves as intended. 
Unit tests are referred to as 'white-box' tests (contrasted to 'black- 
box' tests) because they are written with full knowledge of the internal 
structure of the functions and modules under tests. Unit tests are 
typically prepared by the developer that wrote the code under test and 
are commonly automated, themselves written as small programmers 
that are executed by a unit testing framework (such as JUnit for Java or 
the Test framework in Ruby). The objective is not to test each path of 
execution within a unit (called complete-test or complete-code coverage) , 
but instead to focus tests on areas of risk, uncertainty, or criticality. 
Each test focuses on one aspect of the code (test one thing) and are 
commonly organized into test suites of commonality. 

Some of the benefits of unit testing include: 

• Documentation: The preparation of a suite of tests for a given 
system provide a type of programming documentation highlighting 
the expected behavior of functions and modules and providing 
examples of how to interact with key components. 

• Readability: Unit testing encourages a programming style of small 
modules, clear input and output and fewer inter-component de- 
pendencies. Code written for easy of testing (testability) may be 
easier to read and follow. 



9.3. Testing Algorithms 



383 



• Regression: Together, the suite of tests can be executed as a 
regression-test of the system. The automation of the tests means 
that any defects caused by changes to the code can easily be 
identified. When a defect is found that slipped through, a new 
test can be written to ensure it will be identified in the future. 

Unit tests were traditionally written after the program was completed. 
A popular alternative is to prepare the tests before the functionality of 
the application is prepared, called Test-First or Test-Driven Development 
(TDD). In this method, the tests are written and executed, failing until 
the application functionality is written to make the test pass. The early 
preparation of tests allow the programmer to consider the behavior 
required from the program and the interfaces and functions the program 
needs to expose before they are written. 

The concerns of software testing are very relevant to the development, 
investigation, and application of Metaheuristic and Computational In- 
telligence algorithms. In particular, the strong culture of empirical 
investigation and prototype-based development demands a baseline level 
of trust in the systems that are presented in articles and papers. Trust 
can be instilled in an algorithm by assessing the quality of the algorithm 
implementation itself. Unit testing is lightweight (requiring only the 
writing of automated test code) and meets the needs of promoting qual- 
ity and trust in the code while prototyping and developing algorithms. 
It is strongly suggested as a step in the process of empirical algorithm 
research in the fields of Metaheuristics, Computational Intelligence, and 
Biologically Inspired Computation. 

9.3.2 Unit Testing Example 

This section provides an example of an algorithm and its associated unit 
tests as an illustration of the presented concepts. The implementation 
of the Genetic Algorithm is discussed from the perspective of algorithm 
testing and an example set of unit tests for the Genetic Algorithm 
implementation are presented as a case study. 

Algorithm 

Listing 3.1 in Section 3.2 provides the source code for the Genetic Algo- 
rithm in the Ruby Programming Language. Important considerations 
when in using the Ruby test framework, is ensuring that the functions of 
the algorithm are exposed for testing and that the algorithm demonstra- 
tion itself does not execute. This is achieved through the use of the (if 
__FILE_ == $0) condition, which ensures the example only executes 
when the file is called directly, allowing the functions to be imported 
and executed independently by a unit test script. The algorithm is 
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very modular with its behavior partitioned into small functions, most of 
which arc independently testable. 

The reproduce function has some dependencies although its orches- 
tration of sub-functions is still testable. The search function is the 
only monolithic function, which both depends on all other functions in 
the implementation (directly or indirectly) and hence is difficult to unit 
test. At best, the search function may be a case for system testing 
addressing functional requirements, such as "does the algorithm deliver 
optimized solutions" . 

Unit Tests 

Listing 9.3 provides the TC_GeneticAlgorithm class that makes use of 
the built-in Ruby unit testing framework by extending the TestCase 
class. The listing provides an example of ten unit tests for six of the 
functions in the Genetic Algorithm implementation. Two types of unit 
tests are provided: 

• Deterministic: Directly test the function in question, address- 
ing questions such as: does onemax add correctly? and does 
point_mutation behave correctly? 

• Probabilistic: Test the probabilistic properties of the function in 
question, addressing questions such as: does randomJbitstring 
provide an expected 50/50 mixture of Is and 0s over a large number 
of cases? and does point limitation make an expected number of 
changes over a large number of cases? 

The tests for probabilistic expectations is a weaker form of unit 
testing that can be used to either provide additional confidence to 
deterministically tested functions, or to be used as a last resort when 
direct methods cannot be used. 

Given that a unit test should 'test one thing' it is common for a given 
function to have more than one unit tests. The reproduce function is a 
good example of this with three tests in the suite. This is because it is 
a larger function with behavior called in dependent functions which is 
varied based on parameters. 

class TC_GeneticAlgorithm < Test :: Unit :: TestCase 

# test that the objective function behaves as expected 
def test_onemax 

assert_equal (0 , onemax ( "0000") ) 
assert_equal (4 , onemax ( "1111") ) 
assert_equal (2 , onemax (" 1010") ) 
end 

# test the creation of random strings 
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def test_random_bitstring 

assert_equal (10 , random_bitstring(10) .size) 

assert_equal (0, random_bitstring(10) .delete ( ' 0 1 ) . delete ( 1 1 ' ) . size) 
end 

# test the approximate proportion of l's and 0's 
def test_random_bitstring_ratio 

s = random_bitstring(1000) 

assert_in_delta(0.5, (s . delete( ' 1 '). size/1000 . 0) , 0.05) 
assert_in_delta(0.5, (s . delete( ' 0 '). size/1000 . 0) , 0.05) 
end 

# test that members of the population are selected 
def test_binary_tournament 

pop = Array .new(10) { I i I { : f itness=>i} } 
10 .times {assert (pop . include? (binary .tournament (pop) ) ) } 
end 

# test point mutations at the limits 
def test_point_mutation 

assert_equal("0000000000" , point_mutation("0000000000" , 0)) 
assert_equal("llllllllll" , point_mutation("llllllllll" , 0)) 
assert_equal("llllllllll" , point_mutation("0000000000" , 1)) 
assert_equal("0000000000", point _mutation("llllllllll" , 1)) 
end 

# test that the observed changes approximate the intended probability 
def test_point_mutation_ratio 

changes = 0 
100. times do 

s = point_mutation("0000000000" , 0.5) 

changes += (10 - s .delete (' 1 ')• size) 
end 

assert_in_delta(0.5, changes . to_f/ (100*10) , 0.05) 
end 

# test cloning with crossover 
def test_crossover_clone 

pi, p2 = "0000000000", "1111111111" 
100. times do 

s = crossover (pi, p2, 0) 

assert_equal(pl , s) 

assert_not_same(pl , s) 
end 
end 

# test recombination with crossover 
def test_crossover_recombine 

pi, p2 = "0000000000", "1111111111" 
100. times do 

s = crossover(pl, p2, 1) 

assert_equal(pl . size , s.size) 

assert_not_equal (pi , s) 

assert_not_equal(p2, s) 

s.size. times {|i| assert( (pi [i] ==s [i] ) I I (p2 [i] ==s [i] ) ) } 
end 
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end 

# test odd sized population 
def test_reproduce_odd 

pop = Array. new(9) {|i| { : f itness=>i , :bitstring=>"0000000000"} } 
children = reproduce (pop, pop. size, 0, 1) 
assert_equal(9, children . size) 
end 

# test reproduce size mismatch 
def test_reproduce_mismatch 

pop = Array. new(10) { I i I { : f itness=>i , :bitstring=>"0000000000"} } 
children = reproduce (pop, 9, 0, 0) 
assert_equal (9 , children . size) 
end 
end 

Listing 9.3: Unit Tests for the Genetic Algorithm in Ruby 
9.3.3 Rules-of-Thumb 

Unit testing is easy, although writing good unit tests is difficult given the 
complex relationship the tests have with the code under test. Testing 
Metaheuristics and Computational Intelligence algorithms is harder 
again given their probabilistic nature and their ability to 'work in spite 
of you', that is, provide some kind of result even when implemented 
with defects. 

The following guidelines may help when unit testing an algorithm: 

• Start Small: Some unit tests are better than no unit test and each 
additional test can improve the trust and the quality of the code. 
For an existing algorithm implementation, start by writing a test 
for a small and simple behavior and slowly build up a test suite. 

• Test one thing: Each test should focus on verifying the behavior 
of one aspect of one unit of code. Writing concise and behavior- 
focused unit tests are the objective of the methodology. 

• Test once: A behavior or expectation only needs to be tested once, 
do not repeat a test each time a given unit is tested. 

• Don't forget the I/O: Remember to test the inputs and outputs of 
a unit of code, specifically the pre-conditions and post-conditions. 
It can be easy to focus on the decision points within a unit and 
forget its primary purpose. 

• Write code for testability: The tests should help to shape the code 
they test. Write small functions or modules, think about testing 
while writing code (or write tests first), and refactor code (update 
code after the fact) to make it easier to test. 
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• Function independence: Attempt to limit the direct dependence 
between functions, modules, objects and other constructs. This is 
related to testability and writing small functions although suggests 
limits on how much interaction there is between units of code in 
the algorithm. Less dependence means less side-effects of a given 
unit of code and ultimately less complicated tests. 

• Test Independence: Test should be independent from each other. 
Frameworks provide hooks to set-up and tear-down state prior to 
the execution of each test, there should be no needed to have one 
test prepare data or state for other tests. Tests should be able to 
execute independently and in any order. 

• Test your own code: Avoid writing tests that verify the behavior 
of framework or library code, such as the randomness of a random 
number generator or whether a math or string function behaves 
as expected. Focus on writing test for the manipulation of data 
performed by the code you have written. 

• Probabilistic testing: Metaheuristics and Computational Intelli- 
gence algorithms generally make use of stochastic or probabilistic 
decisions. This means that some behaviors are not deterministic 
and are more difficult to test. As with the example, write prob- 
abilistic tests to verify that such processes behave as intended. 
Given that probabilistic tests are weaker than deterministic tests, 
consider writing deterministic tests first. A probabilistic behavior 
can be made deterministic by replacing the random number gener- 
ator with a proxy that returns deterministic values, called a mock. 
This level of testing may require further impact to the original 
code to allow for dependent modules and objects to be mocked. 

• Consider test-first: Writing the tests first can help to crystallize 
expectations when implementing an algorithm from the literature, 
and help to solidify thoughts when developing or prototyping a 
new idea. 

9.3.4 References 

For more information on software testing, consult a good book on 
software engineering. Two good books dedicated to testing are "Beautiful 
Testing: Leading Professionals Reveal How They Improve Software" that 
provides a compendium of best practices from professional programers 
and testers [2], and 11 Software testing" by Patton that provides a more 
traditional treatment [4]. 

Unit testing is covered in good books on software engineering or 
software testing. Two good books that focus on unit testing include 
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11 Test Driven Development: By Example" on the TDD methodology by 
Beck, a pioneer of Extreme Programming and Test Drive Development 
[1] and "Pragmatic unit testing in Java with JUnit" by Hunt and Thomas 
[3]. 
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9.4 Visualizing Algorithms 

This section considers the role of visualization in the development and ap- 
plication of algorithms from the fields of Metaheuristics, Computational 
Intelligence, and Biologically Inspired Computation. Visualization can 
be a powerful technique for exploring the spatial relationships between 
data (such as an algorithm's performance over time) and investigatory 
tool (such as plotting an objective problem domain or search space). 
Visualization can also provide a weak form of algorithm testing, provid- 
ing observations of efficiency or efficacy that may be indicative of the 
expected algorithm behavior. 

This section provides a discussion of the techniques and methods 
that may be used to explore and evaluate the problems and algorithms 
described throughout this book. The discussion and examples in this 
section are primarily focused on function optimization problems, al- 
though the principles of visualization as exploration (and a weak form 
of algorithm testing) are generally applicable to function approximation 
problem instances. 

9.4.1 Gnuplot 

Gnuplot is a free open source command line tool used to generate plots 
from data. It supports a large number of different plot types and provides 
seemingly limitless configurability. Plots are shown to the screen by 
default, but the tool can easily be configured to generate image files as 
well as M^X, PostScript and PDF documents. 

Gnuplot can be downloaded from the website 2 that also provides 
many demonstrations of different plot types with sample scripts showing 
how the plots were created. There are many tutorials and examples on 
the web, and help is provided inside the Gnuplot software by typing 
help followed by the command name (for example: help plot). For 
a more comprehensive reference on Gnuplot, see Janert's introductory 
book to the software, "Gnuplot in Action" [1]. 

Gnuplot was chosen for the demonstrations in this section as useful 
plots can be created with a minimum number of commands. Additionally, 
it is easily integrated into a range of scripting languages is supported 
on a range of modern operating systems. All examples in this section 
include both the resulting plot and the script used to generate it. The 
scripts may be typed directly into the Gnuplot interpreter or into a file 
which is processed by the Gnuplot command line tool. The examples in 
this section provide a useful starting point for visualizing the problems 
and algorithms described throughout this book. 



2 Gnuplot URL: http://www.gnuplot.info 
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9.4.2 Plotting Problems 

The visualization of the problem under study is an excellent start in 
learning about a given domain. A simple spatial representation of the 
search space or objective function can help to motivate the selection 
and configuration of an appropriate technique. 

The visualization method is specific to the problem type and in- 
stance being considered. This section provides examples of visualizing 
problems from the fields of continuous and combinatorial function opti- 
mization, two classes of problems that appear frequently in the described 
algorithms. 

Continuous Function Optimization 

A continuous function optimization problem is typically visualized in two 
dimensions as a line where x — input, y = f (input) or three dimensions 
as a surface where x, y = input, z — f (input). 

Some functions may have many more dimensions, which if the func- 
tion is linearly separable can be visualized in lower dimensions. Functions 
that are not linearly-separable may be able to make use of projection 
techniques such as Principle Component Analysis (PC A). For example, 
preparing a stratified sample of the search space as vectors with associ- 
ated cost function value and using PCA to project the vectors onto a 
two-dimensional plane for visualization. 

Similarly, the range of each variable input to the function may be 
large. This may mean that some of the complexity or detail may be 
lost when the function is visualized as a line or surface. An indication 
of this detail may be achieved by creating spot-sample plots of narrow 
sub-sections of the function. 

Figure 9.1 provides an example of the Basin function in one dimension. 
The Basin function is a continuous function optimization that seeks 
min/(x) where / = J27=i x i> — 5.0 < X{ < 5.0. The optimal solution for 
this function is (vq, . . . , i>„_i) = 0.0. Listing 9.4 provides the Gnuplot 
script used to prepare the plot (n = 1). 

1 set xrange [-5:5] 

2 plot x*x 

Listing 9.4: Gnuplot script for plotting a function in one-dimension. 

Figure 9.2 provides an example of the basin function in two-dimensions 
as a three-dimensional surface plot. Listing 9.5 provides the Gnuplot 
script used to prepare the surface plot. 

1 set xrange [-5:5] 

2 set yrange [-5:5] 

3 set zrange [0 : 50] 

4 splot x*x+y*y 
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Listing 9.5: Gnuplot script for plotting a function in two-dimensions 

Both plots show the optimum in the center of the domain at x = 0.0 
in one-dimension and x, y = 0.0 in two-dimensions. 



Traveling Salesman Problem 

The Traveling Salesman Problem (TSP) description is comprised of 
a list of cities, each with a different coordinate (at least in the case 
of the symmetric TSP). This can easily be visualized as a map if the 
coordinates at latitudes and longitudes, or as a scatter plot. 

A second possible visualization is to prepare a distance matrix 
(distance between each point and all other points) and visualize the 
matrix directly, with each cell shaded relative to the distances of all 
other cells (largest distances darker and the shorter distances lighter). 
The light areas in the matrix highlight short or possible nearest-neighbor 
cities. 

Figure 9.3 provides a scatter plot of the Berlin52 TSP used through 
out the algorithm descriptions in this book. The Berlin52 problem seeks 
a permutation of the order to visit cities (called a tour) that minimize 
the total distance traveled. The optimal tour distance for Berlin52 is 
7542 units. 
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Figure 9.3: Plot of the cities of the Berlin52 TSP. 
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Listing 9.6 provides the Gnuplot script used to prepare the plot, 
where berlin52 .tsp is a file that contains a listing of the coordinates of 
all cities, one city per line separated by white space. Listing 9.7 provides 
a snippet of the first five lines of the berlin52.tsp file. 



plot "berlin52.tsp" 

Listing 9.6: Gnuplot script for plotting the Berlin52 TSP. 



565.0 575.0 
25.0 185.0 
345.0 750.0 
945.0 685.0 
845.0 655.0 



Listing 9.7: Snippet of the berlin52.tsp file. 

The scatter plot shows some clustering of points toward the middle 
of the domain as well as many points spaced out near the periphery of 
the plot. An optimal solution is not obvious from looking at the plot, 
although one can see the potential for nearest-neighbor heuristics and 
importance of structure preserving operations on candidate solutions. 

9.4.3 Plotting Algorithm Performance 

Visualizing the performance of an algorithm can give indications that 
it is converging (implemented correctly) and provide insight into its 
dynamic behavior. Many algorithms are very simple to implement but 
exhibit complex dynamic behavior that is difficult to model and predict 
beforehand. An understanding of such behavior and the effects of chang- 
ing an algorithm's parameters can be understood through systematic 
and methodological investigation. Exploring parameter configurations 
and plots of an algorithm's performance can give a quick first-pass 
approximation of the algorithms capability and potentially highlight 
fruitful areas for focused investigation. 

Two quite different perspectives on visualizing algorithm performance 
are: a single algorithm run and a comparison between multiple algorithm 
runs. The visualization of algorithm runs is explored in this section in 
the context of the Genetic Algorithm applied to a binary optimization 
problem called OneMax (see Section 3.2). 

Single Algorithm Run 

The performance of an algorithm over the course of a single run can easily 
be visualized as a line graph, regardless of the specific measures used. 
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The graph can be prepared after algorithm execution has completed, 
although, many algorithm frameworks provide dynamic line graphs. 

Figure 9.4 provides an example line graph, showing the quality of the 
best candidate solution located by the Genetic Algorithm each generation 
for a single run applied to a 64-bit OneMax problem. Listing 9.8 provides 
the Gnuplot script used to prepare the plot, where gal .txt is a text file 
that provides the fitness of the best solution each algorithm iteration 
on a new line. Listing 9.9 provides a snippet of the first five lines of the 
gal .txt file. 




0 5 10 15 20 25 30 



Figure 9.4: Line graph of the best solution found by the Genetic Algo- 
rithm. 



set yrange [45 : 64] 

plot "gal. txt" with linespoints 

Listing 9.8: Gnuplot script for creating a line graph. 
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Listing 9.9: Snippet of the gal. txt file. 
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Multiple Algorithm Runs 

Multiple algorithm runs can provide insight into the tendency of an 
algorithm or algorithm configuration on a problem, given the stochastic 
processes that underlie many of these techniques. For example, a 
collection of the best result observed over a number of runs may be 
taken as a distribution indicating the capability of an algorithm for 
solving a given instance of a problem. This distribution may be visualized 
directly. 

Figure 9.5 provides a histogram plot showing the best solutions found 
and the number of times they were located by Genetic Algorithm over 
100 runs on a 300-bit OneMax function. 
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Figure 9.5: Histogram of the best solutions found by a Genetic Algo- 
rithm. 

Listing 9.10 provide the Gnuplot script used to prepare the plot, 
where ga2.histogram.txt is a text file that contains discrete fitness 
values and the number of times it was discovered by the algorithm over 
100 runs. 

1 set yrange [0:17] 

2 set xrange [275:290] 

3 plot "ga2.histogram.txt" with boxes 

Listing 9.10: Gnuplot script for creating a histogram. 
Listing 9.11 provides a snippet of the first five lines of the ga2 .histogram.txt 

file. 
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276 3 

277 3 

278 3 

279 14 

280 11 



Listing 9.11: Snippet of the ga2.histogram.txt file. 



Multiple Distributions of Algorithm Runs 

Algorithms can be compared against each other based on the distribu- 
tions of algorithm performance over a number of runs. This comparison 
usually takes the form of statistical tests that can make meaningful 
statements about the differences between distributions. A visualiza- 
tion of the relative difference between the distributions can aid in an 
interpretation of such statistical measures. 

A compact way for representing a distribution is to use a box-and- 
whisker plot that partitions the data into quartiles, showing the central 
tendency of the distribution, the middle mass of the data (the second and 
third quartiles), the limits of the distribution and any outliers. Algorithm 
run distributions may be summarized as a box-and-whisker plots and 
plotted together to spatially show relative performance relationships. 

Figure 9.6 provides box-and-whisker plots of the best score distribu- 
tion of 100 runs for the Genetic Algorithm applied to a 300-bit OneMax 
problem with three different mutation configurations. The measure 
collected from each run was the quality of the best candidate solution 
found. 

Listing 9.12 provide the Gnuplot script used to prepare the plot, 
where the file boxplotsl.txt contains summaries of the results one 
run per line, each each line containing the min, first, second, and third 
quartiles and the max values separated by a space. Listing 9.13 provides 
a complete listing of the three lines of the boxplotsl .txt file. 

set bars 15.0 
set xrange [-1:3] 

plot 'boxplotsl.txt' using 0:2:1:5:4 with candlesticks whiskerbars 0.5 

Listing 9.12: Gnuplot script for creating a Box-and-whisker plot. 
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Listing 9.13: Complete listing of the boxplotsl .txt file. 
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Figure 9.6: Box-and-whisker plots of the Genetic Algorithm's perfor- 
mance. 

9.4.4 Plotting Candidate Solutions 

Visualizing candidate solutions can provide an insight into the com- 
plexity of the problem and the behavior of an algorithm. This section 
provides examples of visualizing candidate solutions in the context of 
their problem domains from both continuous and combinatorial function 
optimization. 

Continuous Function Optimization 

Visualizing candidate solutions from a continuous function optimization 
domain at periodic times over the course of a run can provide an indica- 
tion of the algorithms behavior in moving through a search space. In 
low dimensions (such as one or two dimensions) this can provide quali- 
tative insights into the relationship between algorithm configurations 
and behavior. 

Figure 9.7 provides a plot of the best solution found each iteration 
by the Particle Swarm Optimization algorithm on the Basin function 
in two dimensions (see Section 6.2). The positions of the candidate 
solutions are projected on top of a heat map of the Basin function in 
two-dimensions, with the gradient representing the cost of solutions at 
each point. Listing 9.14 provides the Gnuplot script used to prepare the 
plot, where psol .txt is a file that contains the coordinates of the best 
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solution found by the algorithm, with one coordinate per line separated 
by a space. Listing 9.15 provides a snippet of the first five lines of the 
psol .txt file. 




-4 -2 0 2 4 



Figure 9.7: Heat map plot showing selected samples in the domain. 



set xrange [-5:5] 
set yrange [-5:5] 
set pm3d map 

set palette gray negative 
set samples 20 
set isosamples 20 

splot x*x+y*y, "psol. txt" using 1:2: (0) with points 

Listing 9.14: Gnuplot script use to create a heat map and selected 
samples. 



-3.9986483808224222 3.8910758979126956 31.12966051677087 
-3.838580364459159 3.266132168962991 25.402318559546302 
-3.678512348095896 2.6411884400132863 20.507329470753803 
-3.518444331732633 2.0162447110635817 16.44469325039336 
-3.35837631536937 1.391300982113877 13.214409898464986 



Listing 9.15: Snippet of the psol .txt file. 
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Traveling Salesman Problem 

Visualizing the results of a combinatorial optimization can provide insight 
into the areas of the problem that a selected technique is handling well, 
or poorly. Candidate solutions can be visualized over the course of a 
run to observe how the complexity of solutions found by a technique 
change over time. Alternatively, the best candidate solutions can be 
visualized at the end of a run. 

Candidate solutions for the TSP are easily visualized as tours (order 
of city visits) in the context of the city coordinates of the problem 
definition. 

Figure 9.8 provides a plot of an example Nearest-Neighbor solution 
for the Berlin52 TSP. A Nearest-Neighbor solution is constructed by 
randomly selecting the first city in the tour then selecting the next 
city in the tour with the minimum distance to the current city until a 
complete tour is created. 
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Figure 9.8: Plot of a Nearest-Neighbor tour for the Berlin52 TSP. 

Listing 9.16 provides the Gnuplot script used to prepare the plot, 
where berlin52 .nn.tour is a file that contains a listing of the coordi- 
nates of all cities separated by white space in order that the cities are 
visited with one city per line. The first city in the tour is repeated as the 
last city in the tour to provide a closed polygon in the plot. Listing 9.17 
provides a snippet of the first five lines of the berlin52. nn.tour file. 



i plot "berlin52. nn. tour" with linespoints 
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Listing 9.16: Gnuplot script for plotting a tour for a TSP. 



475 960 
525 1000 
510 875 
555 815 
575 665 



Listing 9.17: Snippet of the berlin52.nn.tour file. 

Figure 9.9 provides a plot of the known optimal solution for the 
Berlin52 Traveling Salesman problem. 




0 200 400 600 800 1000 1200 1400 1600 1800 



Figure 9.9: Plot of the optimal tour for the Berlin52 TSP. 

Listing 9.18 provides the Gnuplot script used to prepare the plot, 
where berlin52 . optimal is a file that contains a listing of the coordi- 
nates of all cities in order that the cities are visited with one city per 
line separated by white space. The first city in the tour is repeated as 
the last city in the tour to provide a closed polygon in the plot. 

plot "berlin52 . optimal" with linespoints 

Listing 9.18: Gnuplot script for plotting a tour for a TSP. 
Listing 9.19 provides a snippet of the first five lines of the berlin52 . optimal 

file. 



9.4. Visualizing Algorithms 



401 



565 


0 


575 


0 


605 


0 


625 


0 


575 


0 


665 


0 


555 


0 


815 


0 


510 


0 


875 


0 



Listing 9.19: Snippet of the berlin52 . optimal file. 
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9.5 Problem Solving Strategies 

The field of Data Mining has clear methodologies that guide a practi- 
tioner to solve problems, such as Knowledge Discovery in Databases 
(KDD) [16]. Metaheuristics and Computational Intelligence algorithms 
have no such methodology. 3 

This section describes some of the considerations when applying 
algorithms from the fields of Metaheuristics, Computational Intelligence, 
and Biologically Inspired Computation to practical problem domains. 
This discussion includes: 

• The suitability of application of a given technique to a given 
problem and the transferability of algorithm and problem features 
(Section 9.5.1) 

• The distinction between strong and weak methods which use 
more or less problem specific information respectively, and the 
continuum between these extremes (Section 9.5.2). 

• A summary of problem solving strategies that suggest different 
ways of applying a given technique to the function optimization 
and approximation fields (Section 9.5.3). 

9.5.1 Suitability of Application 

From a problem-solving perspective, the tools that emerge from the field 
of Computational Intelligence are generally assessed with regard to their 
utility as efficiently or effectively solving problems. An important lesson 
from the No-Free-Lunch Theorem was to bound claims of applicability 
(see Section subsec:nfl), that is to consider the suitability of a given 
strategy with regard to the feature overlap with the attributes of a given 
problem domain. From a Computational Intelligence perspective, one 
may consider the architecture, processes, and constraints of a given 
strategy as the features of an approach. 

The suitability of the application of a particular approach to a prob- 
lem takes into considerations concerns such as the appropriateness (can 
the approach address the problem), the feasibility (available resources 
and related efficiency concerns), and the flexibility (ability to address 
unexpected or unintended effects). This section summarizes a general 
methodology toward addressing the problem of suitability in the con- 
text of Computational Intelligence tools. This methodology involves 1) 
the systematic elicitation of system and problem features, and 2) the 
consideration of the overlap of problem-problem, algorithm-algorithm, 
and problem-algorithm overlap of feature sets. 



3 Some methods can be used for classification and regression and as such may fit 
into methodologies such as KDD. 



9.5. Problem Solving Strategies 



403 



Systematic Feature Elicitation 

A feature of a system (tool, strategy, model) or a problem is a distinctive 
element or property that may be used to differentiate it from similar 
and/or related cases. Examples may include functional concerns such 
as: processes, data structures, architectures, and constraints, as well 
as emergent concerns that may have a more subjective quality such 
as general behaviors, organizations, and higher-order structures. The 
process of the elicitation of features may be taken from a system or 
problem perspective: 

• System Perspective: This requires a strong focus on the lower level 
functional elements and investigations that work toward correlat- 
ing specific controlled procedures towards predictable emergent 
behaviors. 

• Problem Perspective: May require both a generalization of the 
specific case to the general problem well as a functional 
or logical decomposition into constituent parts. 

Problem generalization and functional decomposition are important 
and commonly used patterns for problem solving in the broader fields 
of Artificial Intelligence and Machine Learning. The promotion of 
simplification and modularity can reduce the cost and complexity of 
achieving solutions [10, 43]. 

Feature Overlap 

Overlap in elicited features may be considered from three important 
perspectives: between systems, between problems, and between a system 
and a problem. Further, such overlap may be considered at different 
levels of detail with regard to generalized problem solving strategies and 
problem definitions. These overlap cases are considered as follows: 

• System Overlap defines the suitability of comparing one system 
to another, referred to as comparability. For example, systems 
may be considered for the same general problems and compared 
in terms of theoretical or empirical capability, the results of which 
may only be meaningful if the systems are significantly similar to 
each other as assessed in terms of feature overlap. 

• Problem Overlap defines the suitability of comparing one problem 
to another, referred to as transferability. From a systems focus, 
transferability refers to the capability of a technique on a given 
problem to be successfully applied to another problem, the result 
of which is only meaningful if there is a strong overlap between 
the problems under consideration. 
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• System-Problem Overlap defines the suitability of a system on a 
given problem, referred to as applicability. For example, a system is 
considered suitable for a given problem if it has a significant overlap 
in capabilities with the requirements of the problem definition. 

Such mappings are imprecise given the subjective assessment and 
complexity required in both the elicitation and consideration overlap 
of the of features, the hardest of which is expected to be the mapping 
between systems and problems. The mapping of salient features of 
algorithms and problems was proposed as an important reconciliation of 
the No- Free-Lunch Theorem by Wolpert and Macready [58] , although the 
important difference of this approach is that the system and algorithm 
are given prior to the assessment. In their first work on the theorem, 
Wolpert and Macready specifically propose the elicitation of the features 
from a problem-first perspective, for which specialized algorithms can 
be defined [57]. Therefore, this methodology of suitability may be 
considered a generalization of this reconciliation suitable for the altered 
Computational Intelligence (strategy first) perspective on Artificial 
Intelligence. 

9.5.2 Strong and Weak Methods 

Generally, the methods from the fields of Metaheuristics, Computational 
Intelligence, and Biologically Inspired Computation may be considered 
weak methods. They are general purpose and are typically considered 
black-box solvers for a range of problem domains. The stronger the 
method, the more that must be known about the problem domain. 
Rather than discriminating techniques into weak and strong it is more 
useful to consider a continuum of methods from pure block box tech- 
niques that have few assumptions about the problem domain, to strong 
methods that exploit most or all of the problem specific information 
available. 

For example, the Traveling Salesman Problem is an example of a 
combinatorial optimization problem. A naive (such a Random Search) 
black box method may simply explore permutations of the cities. Slightly 
stronger methods may initialize the search with a heuristic-generated 
technique (such as nearest neighbor) and explore the search space using 
a variation method that also exploits heuristic information about the 
domain (such as a 2-opt variation). Continuing along this theme, a 
stochastic method may explore the search space using a combination of 
probabilistic and heuristic information (such as Ant Colony Optimization 
algorithms). At the other end of the scale the stochastic elements are 
decreased or removed until one is left with pure heuristic methods such 
as the Lin-Kernighan heuristic [31] and exact algorithms from linear 
and dynamic programming that focus on the structure and nature of 
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the problem [55]. 

Approaching a problem is not as simple as selecting the strongest 
method available and solving it. The following describes two potential 
strategies: 

• Start Strong: Select the strongest technique available and apply it 
to the problem. Difficult problems can be resistant to traditional 
methods for many intrinsic and extrinsic reasons. Use products 
from a strong technique (best solution found, heuristics) to seed 
the next weaker method in line. 

• Start Weak: Strong methods do not exist for all problems, and if 
they do exist, the computation, skill, and/or time resources may 
not be available to exploit them. Start with a weak technique and 
use it to learn about the problem domain. Use this information 
to make better decisions about subsequent techniques to try that 
can exploit what has been learned. 

In a real-world engineering or business scenario, the objective is to 
solve a problem or achieve the best possible solution to the problem 
within the operating constraints. Concerns of algorithm and technique 
purity become less important than they may be in their respective 
fields of research. Both of the above strategies suggest an iterative 
methodology, where the product or knowledge gained from one technique 
may be used to prime a subsequent stronger or weaker technique. 

9.5.3 Domain-Specific Strategies 

An algorithm may be considered a strategy for problem solving. There 
are a wide range of ways in which a given algorithm can be used to solve 
a problem. Function Optimization and Function Approximation were 
presented as two general classes of problems to which the algorithms from 
the fields of Metaheuristics, Computational Intelligence, and Biologically 
Inspired Computation are applied. This section reviews general problem 
problem solving strategies that may be adopted for a given technique in 
each of these general problem domains. 

Function Optimization 

This section reviews a select set of strategies for addressing optimization 
problems from the field of Metaheuristics and Computational Intelligence 
to provide general insight into the state of the interaction between 
stochastic algorithms and the field of optimization. This section draws 
heavily from the field of Evolutionary Computation, Swarm Intelligence, 
and related Computational Intelligence sub-fields. 
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Global and Local Optimization Global Optimization refers to 
seeking a globally optimal structure or approximation thereof in a given 
problem domain. Global is differentiated from Local Optimization 
in that the latter focuses on locating an optimal structure within a 
constrained region of the decision variable search space, such as a single 
peak or valley (basin of attraction) . In the literature, global optimization 
problems refers to the class of optimization problems that generally 
cannot be addressed through more conventional approaches such as 
gradient descent methods (that require mathematical derivatives) and 
pattern search (that can get 'stuck' in local optima and never converge) 
[41, 53]. 

A global search strategy provides the benefit of making few if any 
assumptions about where promising areas of the search space may be, 
potentially highlighting unintuitive combinations of parameters. A local 
search strategy provides the benefit of focus and refinement of an existing 
candidate solution. It is common to apply a local search method to the 
solutions located by a global search procedure as a refinement strategy 
(such as using a Hill Climber (Section 2.4) after a Genetic Algorithm 
(Section 3.2)), and some methods have both techniques built in (such as 
GRASP in Section 2.8). 

Parallel Optimization A natural step toward addressing difficult 
(large and rugged cost landscapes) is to exploit parallel and distributed 
hardware, to get an improved result in the same amount of time, the 
same result in less time, or both [12]. Towards unifying the myriad 
of approaches and hardware configurations, a general consensus and 
taxonomy has been defined by the Parallel Evolutionary Algorithms 
(PEA) and Parallel Metaheuristics fields that considers the ratio of 
communication to computation called granularity [4, 11]. 

This taxonomy is presented concisely by Alba and Tomassini as a plot 
or trade-off of three concerns: 1) the number of sub-populations (models 
or parallel strategies working on the problem), 2) the coupling between 
the sub-populations (frequency and amplitude of communication) , and 
3) the size of the sub-populations (size or extent of the sub-models) [5] . 

Two important and relevant findings from the narrower field of 
Parallel Evolutionary Algorithms include 1) that tight coupling (frequent 
inter-system migration of candidate solutions) between coarse-grained 
models typically results in worse performance than a non-distributed 
approach [6], and 2) that loose coupling (infrequent migration) between 
coarse-grained models has been consistently shown to provide a super- 
linear increase in performance [3, 7, 11]. 

Cooperative Search This is a more general approach that considers 
the use of multiple models that work together to address a difficult 
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optimization problems. Durfee et al. consider so-called Cooperative 
Distributed Problem Solving (CDPS) in which a network of loosely 
coupled solvers are employed to address complex distributed problems. 
In such systems, it is desirable to match the processing capabilities of the 
solver to the attributes of the problem. For example, a given problem 
may have spatially distributed, functionally distributed, or temporally 
distributed sub-problems to which a centralized and monolithic system 
may not be suitable. 

Lesser [30] considers CDPS and proposes such models perform dis- 
tributed search on dependent or independent and potentially overlapping 
sub-problems as a motivating perspective for conducting research into 
Distributed Artificial Intelligence (DAI) 4 . Lesser points out that in real 
world applications, it is hard to get a optimal mapping between the allo- 
cated resources and the needs or availability of information for a given 
problem, suggesting that such problems may be caused by a mismatch 
in processing times and/or number of sub-problems, interdependencies 
between sub-problems, and local experts whose expertise cannot be ef- 
fectively communicated. For a more detail on the relationships between 
parallel and cooperative search, El-Abd and Kamel provide a rigorous 
taxonomy [15]. 

Hybrid Search Hybrid Search is a perspective on optimization that 
focuses on the use of multiple and likely different approaches either 
sequentially (as in the canonical global and local search case), or in 
parallel (such as in Cooperative Search). For example in this latter 
case, it is common in the field of PEA to encourage different levels of 
exploration and exploitation across island populations by varying the 
operators or operator configurations used [2, 51]. 

Talbi proposed a detailed 4-level taxonomy of Hybrid Metaheuristics 
that concerns parallel and cooperating approaches [50]. The taxonomy 
encompasses parallel and cooperative considerations for optimization 
and focuses on the discriminating features in the lowest level such as 
heterogeneity, and specialization of approaches. 

Functional Decomposition Three examples of a functional decom- 
position of optimization include 1) multiple objectives, 2) multiple 
constraints, and 3) partitions of the decision variable search space. 

Multi-Objective Optimization (MOO) is a sub-field that is concerned 
with the optimization of two or more objective functions. A solution to 
a MOO conventionally involves locating and returning a set of candidate 
solutions called the non-dominated set [13]. The Pareto optimal set, is 
the set of optimal non-dominated solutions. For a given problem no 



4 This perspective provided the basis for what became the field of Multi- Agent 
Systems (MAS). 
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feasible solution exists that dominates a Pareto optimal solution. All 
solutions that are Pareto optimal belong to the Pareto set, and the 
points that these solutions map to in the objective space is called the 
Pareto front. The complexity with MOO problems is in the typically 
unknown dependencies between decision variables across objectives, that 
in the case of conflicts, must be traded off (Purshouse and Fleming 
provide a taxonomy of such complexity [42]). 

Constraint Satisfaction Problem's (CSP) involve the optimization of 
decision variables under a set of constraints. The principle complexity 
in such problems is in locating structures that are feasible or violate the 
least number of constraints, optimizing such feasibility [27, 54]. 

Search Space Partitioning involves partitioning of the decision vari- 
able search space (for example see Multispace Search by Gu et al. 
[14, 21, 22]). This is a critical consideration given that for equal-sized 
dimensional bounds on parameters, an increase in decision variables 
results in an exponential increase in the volume of the space to search. 

Availability Decomposition Optimization problems may be par- 
titioned by the concerns of temporal and spatial distribution of 1) 
information availability, and 2) computation availability. An interesting 
area of research regarding variable information availability for optimiza- 
tion problems is called Interactive Evolutionary Computation, in which 
one or a collection of human operators dynamically interact with an 
optimization process [49]. Example problem domains include but are 
not limited to computer graphics, industrial design, image processing, 
and drug design. 

There is an increasing demand to exploit clusters of heterogeneous 
workstations to complete large-scale distributed computation tasks like 
optimization, typically in an opportunistic manner such as when in- 
dividual machines are underutilized. The effect is that optimization 
strategies such as random partitioning of the search space (indepen- 
dent non-interacting processing) are required to take advantage of such 
environments for optimization problems [32, 46]. 

Meta Optimization One may optimize at a level above that con- 
sidered in previous sections. Specifically, 1) the iterative generation 
of an inductive model called multiple restart optimization, and 2) the 
optimization of the parameters of the process that generates an induc- 
tive model of an optimization problem. Multiple or iterative restarts 
involves multiple independent algorithm executions from different (ran- 
dom) starting conditions. It is generally considered as a method for 
achieving an improved result in difficult optimization problems where 
a given strategy is deceived by local or false optima [24, 34], typically 
requiring a restart schedule [17]. 
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A second and well studied form of meta optimization involves the 
optimization of the search process itself. Classical examples include the 
self-adaptation of mutation parameters (step sizes) in the Evolutionary 
Strategies (ES) and Evolutionary Programming (EP) approaches. Smith 
and Fogarty provided a review of genetic algorithms with adaptive 
strategies including a taxonomy in which the meta-adaptations are 
applied at one of three levels: 1) the population (adapting the overall 
sampling strategy), 2) the individual (adapting the creation of new 
samples in the decision variable space), and 3) components (modifying 
component contributions and/or individual step sizes as in ES and EP) 
[48]. 

Function Approximation 

This section reviews a select set of strategies for addressing Function 
Approximation problems from the fields of Artificial Intelligence and 
Computational Intelligence to provide general insight into the state of 
the interaction between stochastic algorithms and the field. The review 
draws heavily from the fields of Artificial Neural Networks, specifically 
Competitive Learning, as well as related inductive Machine Learning 
fields such as Instance Based Learning. 

Vector Quantization Vector Quantization (VQ) refers to a method 
of approximating a target function using a set of exemplar (prototype 
or codebook) vectors. The exemplars represent a discrete subset of the 
problem, generally restricted to the features of interest using the natural 
representation of the observations in the problem space, typically an 
an unconstrained 7i-dimensional real valued space. The VQ method 
provides the advantage of a non-parametric model of a target function 
(like instance-based and lazy learning such as the fc-Nearest-Neighbor 
method (fcNN)) using a symbolic representation that is meaningful in 
the domain (like tree-based approaches). 

The promotion of compression addresses the storage and retrieval 
concerns of &NN, although the selection of codebook vectors (the so- 
called quantization problem) is a hard problem that is known to be 
NP-complete [18]. More recently Kuncheva and Bezdek have worked 
towards unifying quantization methods in the application to classification 
problems, referring to the approaches as Nearest Prototype Classifiers 
(NPC) and proposing a generalized nearest prototype classifier [28, 29]. 

Parallelization Instance-based approaches are inherently parallel 
given the generally discrete independent nature in which they are used, 
specifically in a case or per-query manner. As such, parallel hardware 
can be exploited in the preparation of the corpus of prototypes (parallel 
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preparation), and more so in the application of the corpus given its read- 
only usage [1, 35, 39]. With regard to vector quantization specifically, 
there is an industry centered around the design and development of VQ 
and WTA algorithms and circuits given their usage to compress digital 
audio and video data [36, 38]. 

Cooperative Methods Classical cooperative methods in the broader 
field of statistical machine learning are referred to as Ensemble Methods 
[37, 40] or more recently Multiclassifier Systems [20]. 

Boosting is based on the principle of combining a set of quasi- 
independent weak learners that collectively are as effective as a single 
strong learner [26, 44]. The seminal approach is called Adaptive Boost- 
ing (AdaBoost) that involves the preparation of a series of classifiers, 
where subsequent classifiers are prepared for the observations that are 
misclassified by the proceeding classifier models (creation of specialists) 
[45]. 

Bootstrap Aggregation (bagging) involves partitioning the observa- 
tions into N randomly chosen subsets (with re-selection), and training 
a different model on each [9], Although robust to noisy datasets, the 
approach requires careful consideration as to the consensus mechanism 
between the independent models for decision making. 

Stacked Generalization (stacking) involves creating a sequence of 
models of generally different types arranged into a stack, where subse- 
quently added models generalize the behavior (success or failure) of the 
model before it with the intent of correcting erroneous decision making 
[52, 56]. 

Functional Decomposition As demonstrated, it is common in en- 
semble methods to partition the dataset either explicitly or implicitly 
to improve the approximation of the underlying target function. A first 
important decomposition involves partitioning the problem space into 
sub-spaces based on the attributes, regular groups of attributes called 
features, and decision attributes such as class labels. A popular method 
for attribute-based partitioning is called the Random Subspace Method, 
involving the random partitioning of attributes to which specialized 
model is prepared for each (commonly used on tree-based approaches) 
[23]. 

A related approach involves a hierarchical partitioning of attributes 
space into sub-vectors (sub-spaces) used to improve VQ-based com- 
pression [19]. Another important functional decomposition methods 
involve the partitioning of the set of observations. The are many ways 
in which observations may be divided, although common approaches 
include pre-processing using clustering techniques to divide the set into 
natural groups, additional statistical approaches that partition based 
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on central tendency and outliers, and re-sampling methods that are 
required to reduce the volume of observations. 

Availability Decomposition The availability observations required 
to address function approximation in real-world problem domains moti- 
vate the current state of the art in Distributed Data Mining (DDM, or 
sometimes Collective Data Mining), Parallel Data Mining (PDM), and 
Distributed Knowledge Discovery in Database (DKDD) [25]. The gen- 
eral information availability concerns include 1) the intractable volume 
of observations, and 2) the spatial (geographical) and temporal distribu- 
tion of information [59] . In many real- world problems it is infeasible to 
centralize relevant observations for modeling, requiring scalable, load 
balancing, and incremental acquisition of information [47]. 

Meta- Approximation The so-called ensemble or multiple-classifier 
methods may be considered meta approximation approaches as they are 
not specific to a given modeling technique. As with function optimization, 
meta-approaches may be divided into restart methods and meta-learning 
algorithms. The use of restart methods is a standard practice for 
connectionist approaches, and more generally in approaches that use 
random starting conditions and a gradient or local search method of 
refinement. 

The method provides an opportunity for over-coming local optima 
in the error-response surface, when there is an unknown time remaining 
until convergence [33], and can exploit parallel hardware to provide 
a speed advantage [8]. Ensemble methods and variants are examples 
of meta approximation approaches, as well as the use of consensus 
classifiers (gate networks in mixtures of experts) to integrate and weight 
the decision making properties from ensembles. 
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9.6 Benchmarking Algorithms 

When it conies to evaluating an optimization algorithm, every researcher 
has their own thoughts on the way it should be done. Unfortunately, 
many empirical evaluations of optimization algorithms are performed and 
reported without addressing basic experimental design considerations. 
This section provides a summary of the literature on experimental 
design and empirical algorithm comparison methodology. This summary 
contains rules of thumb and the seeds of best practice when attempting 
to configure and compare optimization algorithms, specifically in the 
face of the no-frcc-lunch theorem. 

9.6.1 Issues of Benchmarking Methodology 

Empirically comparing the performance of algorithms on optimization 
problem instances is a staple for the fields of Heuristics and Biologi- 
cally Inspired Computation, and the problems of effective comparison 
methodology have been discussed since the inception of these fields. 
Johnson suggests that the coding of an algorithm is the easy part of the 
process; the difficult work is getting meaningful and publishable results 
[24] . He goes on to provide a very through list of questions to consider 
before racing algorithms, as well as what he describes as his "pet peeves" 
within the field of empirical algorithm research. 

Hooker [22] (among others) practically condemns what he refers to as 
competitive testing of heuristic algorithms, calling it 11 fundamentally anti- 
intellectual^ . He goes on to strongly encourage a rigorous methodology 
of what he refers to as scientific testing where the aim is to investigate 
algorithmic behaviors. 

Barr, Golden et al. [1] list a number of properties worthy of a heuristic 
method making a contribution, which can be paraphrased as; efficiency, 
efficacy, robustness, complexity, impact, generalizability, and innovation. 
This is interesting given that many (perhaps a majority) of conference 
papers focus on solution quality alone (one aspect of efficacy). In their 
classical work on reporting empirical results of heuristics Barr, Golden 
et al. specify a loose experimental setup methodology with the following 
steps: 

1. Define the goals of the experiment. 

2. Select measure of performance and factors to explore. 

3. Design and execute the experiment. 

4. Analyze the data and draw conclusions. 

5. Report the experimental results. 
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They then suggest eight guidelines for reporting results, in summary 
they are; reproducibility, specify all influential factors (code, computing 
environment, etc), be precise regarding measures, specify parameters, 
use statistical experimental design, compare with other methods, reduce 
variability of results, and ensure results are comprehensive. They then 
clarify these points with examples. 

Peer, Engelbrecht et al. [32] summarize the problems of algorithm 
benchmarking (with a bias toward particle swarm optimization) to the 
following points: duplication of effort, insufficient testing, failure to 
test against state-of-the-art, poor choice of parameters, conflicting re- 
sults, and invalid statistical inference. Eiben and Jelasity [14] sight 
four problems with the state of benchmarking evolutionary algorithms; 
1) test instances are chosen ad hoc from the literature, 2) results are 
provided without regard to research objectives, 3) scope of generalized 
performance is generally too broad, and 4) results are hard to repro- 
duce. Gent and Walsh provide a summary of simple dos and don'ts for 
experimentally analyzing algorithms [20]. For an excellent introduction 
to empirical research and experimental design in artificial intelligence 
see Cohen's book 11 Empirical Methods for Artificial Intelligence" [10]. 

The theme of the classical works on algorithm testing methodology 
is that there is a lack of rigor in the field. The following sections will 
discuss three main problem areas to consider before benchmarking, 
namely 1) treating algorithms as complex systems that need to be tuned 
before applied, 2) considerations when selecting problem instances for 
benchmarking, and 3) the selection of measures of performance and 
statistical procedures for testing experimental hypotheses. A final section 
4) covers additional best practices to consider. 

9.6.2 Selecting Algorithm Parameters 

Optimization algorithms are parameterized, although in the majority 
of cases the effect of adjusting algorithm parameters is not fully un- 
derstood. This is because unknown non-linear dependencies commonly 
exist between the variables resulting in the algorithm being considered 
a complex system. Further, one must be careful when generalizing the 
performance of parameters across problem instances, problem classes, 
and domains. Finally, given that algorithm parameters are typically 
a mixture of real and integer numbers, exhaustively enumerating the 
parameter space of an algorithm is commonly intractable. 

There are many solutions to this problem such as self-adaptive 
parameters, meta-algorithms (for searching for good parameter values), 
and methods of performing sensitivity analysis over parameter ranges. A 
good introduction to the parameterization of genetic algorithms is Lobo, 
Lima et al. [27]. The best and self-evident place to start (although often 
ignored [14]) is to investigate the literature and see what parameters 
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been used historically. Although not a robust solution, it may prove 
to be a useful starting point for further investigation. The traditional 
approach is to run an algorithm on a large number of test instances 
and generalize the results [37]. We, as a field, haven't really come 
much further than this historical methodology other than perhaps the 
application of more and differing statistical methods to decrease effort 
and better support findings. 

A promising area of study involves treating the algorithm as a 
complex system, where problem instances may become yet another 
parameter of the model [7, 36]. From here, sensitivity analysis can be 
performed in conjunction with statistical methods to discover parameters 
that have the greatest effect [8] and perhaps generalize model behaviors. 

Francois and Lavergne [18] mention the deficiencies of the traditional 
trial-and-error and experienced-practitioner approaches to parameter 
tuning, further suggesting that seeking general rules for parameteriza- 
tion will lead to optimization algorithms that offer neither convergent 
or efficient behaviors. They offer a statistical model for evolutionary 
algorithms that describes a functional relationship between algorithm 
parameters and performance. Nannen and Eiben [29, 30] propose a 
statistical approach called REVAC (previously Calibration and Rele- 
vance Estimation) to estimating the relevance of parameters in a genetic 
algorithm. Coy, Golden et al. [12] use a statistical steepest decent 
method procedure for locating good parameters for metaheuristics on 
many different combinatorial problem instances. 

Bartz-Beielstein [3] used a statistical experimental design method- 
ology to investigate the parameterization of the Evolutionary Strategy 
(ES) algorithm. A sequential statistical methodology is proposed by 
Bartz-Beielstein, Parsopoulos et al. [4] for investigating the parameteri- 
zation and comparisons between the Particle Swarm Optimization (PSO) 
algorithm, the Nelder-Mead Simplex Algorithm (direct search), and the 
Quasi-Newton algorithm (derivative-based). Finally, an approach that is 
popular within the metaheuristic and Ant Colony Optimization (ACO) 
community is to use automated Monte Carlo and statistical procedures 
for sampling discretized parameter space of algorithms on benchmark 
problem instances [6]. Similar racing procedures have also been applied 
to evolutionary algorithms [41]. 

9.6.3 Problem Instances 

This section focuses on issues related to the selection of function opti- 
mization test instances, but the general theme of cautiously selecting 
problem instances is generally applicable. 

Common lists of test instances include; De Jong [25], Fogel [17], and 
Schwefel [38]. Yao, Lui et al. [40] list many canonical test instances as 
does Schaffer, Caruana et al. [37]. Gallagher and Yuan [19] review test 
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function generators and propose a tunable mixture of Gaussians test 
problem generators. Finally, McNish [28] proposes using fractal-based 
test problem generators via a web interface. 

The division of test problems into classes is another axiom of modern 
optimization algorithm research, although the issues with this methodol- 
ogy are the taxonomic criterion for problem classes and on the selection 
of problem instances for classes. 

Eiben and Jelasity [14] strongly support the division of problem 
instances into categories and encourage the evaluation of optimization 
algorithms over a large number of test instances. They suggest classes 
could be natural (taken from the real world), or artificial (simplified 
or generated). In their paper on understanding the interactions of GA 
parameters, Deb and Agrawal [13] propose four structural properties 
of problems for testing genetic algorithms; multi-modality, deception, 
isolation, and collateral noise. Yao, Lui et al. [40] divide their large 
test dataset into the categories of unimodal, 'multimodal-many local 
optima', and 'multimodal- few local optima'. Whitley, Rana et al. [39] 
provide a detailed study on the problems of selecting test instances for 
genetic algorithms. They suggest that difficult problem instances should 
be non-linear, non-separable, and non-symmetric. 

English [15] suggests that many functions in the field of EC are 
selected based on structures in the response surface (as demonstrated 
in the above examples), and that they inherently contain a strong 
Euclidean bias. The implication is that the algorithms already have 
some a priori knowledge about the domain built into them and that 
results are always reported on a restricted problem set. This is a reminder 
that instances are selected to demonstrate algorithmic behavior, rather 
than performance. 

9.6.4 Measures and Statistical Methods 

There are many ways to measure the performance of an optimization 
algorithm for a problem instance, although the most common involves 
a quality (efficacy) measure of solution(s) found (see the following for 
lists and discussion of common performance measures [1, 4, 5, 14, 23]). 
Most biologically inspired optimization algorithms have a stochastic 
element, typically in their starting position(s) and in the probabilistic 
decisions made during sampling of the domain. Thus, the performance 
measurements must be repeated a number of times to account for the 
stochastic variance, which could also be a measure of comparison between 
algorithms. 

Irrespective of the measures used, sound statistical experimental 
design requires the specification of 1) a null hypothesis (no change), 
2) alternative hypotheses (difference, directional difference), and 3) 
acceptance or rejection criteria for the hypothesis. The null hypothesis is 
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commonly stated as the equality between two or more central tendencies 
(mean or medians) of a quality measure in a typical case of comparing 
stochastic-based optimization algorithms on a problem instance. 

Peer, Engelbrech et al. [32] and Birattari and Dorigo [5] provide 
a basic introduction (suitable for an algorithm-practitioner) into the 
appropriateness of various statistical tests for algorithm comparisons. 
For a good introduction to statistics and data analysis see Peck et al. [31], 
for an introduction to non-parametric methods see Holander and Wolfe 
[21], and for a detailed presentation of parametric and nonparametric 
methods and their suitability of application see Sheskin [23] . For an 
excellent open source software package for performing statistical analysis 
on data see the R Project. 5 

To summarize, parametric statistical methods are used for inter- 
val and ratio data (like a real- valued performance measure), and non- 
parametric methods are used for ordinal, categorical and rank-based 
data. Interval data is typically converted to ordinal data when salient 
constraints of desired parametric tests (such as assumed normality of 
distribution) are broken such that the less powerful nonparametric tests 
can be used. The use of nonparametric statistical tests may be preferred 
as some authors [9, 32] claim the distribution of cost values are very 
asymmetric and/or not Gaussian. It is important to remember that 
most parametric tests degrade gracefully. 

Chiarandini, Basso et al. [9] provide an excellent case study for using 
the permutation test (a nonparametric statistical method) to compare 
stochastic optimizers by running each algorithm once per problem 
instance, and multiple times per problem instance. While rigorous, their 
method appears quite complex and their results are difficult to interpret. 

Barrett, Marathe et al. [2] provide a rigorous example of applying 
the parametric test Analysis of Variance (ANOVA) of three different 
heuristic methods on a small sample of scenarios. Reeves and Write 
[34, 35] also provide an example of using ANOVA in their investigation 
into epistasis on genetic algorithms. In their tutorial on the experimental 
investigation of heuristic methods, Rardin and Uzsoy [33] warn against 
the use of statistical methods, claiming their rigidity as a problem, 
and the importance of practical significance over that of statistical 
significance. They go on in the face of their own objections to provide 
an example of using ANOVA to analyze the results of an illustrative 
case study. 

Finally, Peer, Engelbrech et al. [32] highlight a number of case study 
example papers that use statistical methods inappropriately. In their 
OptiBench system and method, algorithm results are standardized, 
ranked according to three criteria and compared using the Wilcoxon 
Rank-Sum test, a non-parametric alternative to the Student-T test that 
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is commonly used. 

9.6.5 Other 

Another pervasive problem in the field of optimization is the repro- 
ducibility (implementation) of an algorithm. An excellent solution to 
this problem is making source code available by creating or collaborat- 
ing with open-source software projects. This behavior may result in 
implementation standardization, a reduction in the duplication of effort 
for experimentation and repeatability, and perhaps more experimental 
accountability [14, 32]. 

Peer, Engelbrech et al. [32] stress the need to compare to the state- 
of-the-art implementations rather than the historic canonical implemen- 
tations to give a fair and meaningful evaluation of performance. 

Another area that is often neglected is that of algorithm descriptions, 
particularly in regard to reproducibility. Pseudocode is often used, 
although (in most cases) in an inconsistent manner and almost always 
without reference to a recognized pseudocode standard or mathemat- 
ical notation. Many examples are a mix of programming languages, 
English descriptions and mathematical notation, making them difficult 
to follow, and commonly impossible to implement in software due to 
incompleteness and ambiguity. 

An excellent tool for comparing optimization algorithms in terms 
of their asymptotic behavior from the field of computation complexity 
is the Big-0 notation [11]. In addition to clarifying aspects of the 
algorithm, it provides a problem independent way of characterizing an 
algorithms space and or time complexity. 

9.6.6 Summary 

It is clear that there is no silver bullet to experimental design for 
empirically evaluating and comparing optimization algorithms, although 
there are as many methods and options as there are publications on 
the topic. The field of stochastic optimization has not yet agreed upon 
general methods of application like the field of data mining (processes 
such as Knowledge Discovery in Databases (KDD) [16]). Although 
these processes are not experimental methods for comparing machine 
learning algorithms, they do provide a general model to encourage 
the practitioner to consider important issues before application of an 
approach. 

Finally, it is worth pointing out a somewhat controversially titled 
paper by De Jong [26] that provides a reminder that although the 
genetic algorithm has been shown to solve function optimization, it is 
not innately a function optimizer, and function optimization is only a 
demonstration of this complex adaptive system's ability to learn. It is a 
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reminder to be careful not to link an approach too tightly with a domain, 
particularly if the domain was chosen for demonstration purposes. 
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Appendix A 

Ruby: Quick- St art 
Guide 

A.l Overview 

All code examples in this book are provided in the Ruby programming 
language. This appendix provides a high-level introduction to the Ruby 
programming language. This guide is intended for programmers of an 
existing procedural language (such as Python, Java, C, C++, C#) to 
learn enough Ruby to be able to interpret and modify the code examples 
provided in the Clever Algorithms project. 

A. 2 Language Basics 

This section summarizes the basics of the language, including variables, 
flow control, data structures, and functions. 

A. 2.1 Ruby Files 

Ruby is an interpreted language, meaning that programs are typed as 
text into a . rb file which is parsed and executed at the time the script 
is run. For example, the following snippet shows how to invoke the 
Ruby interpreter on a script in the file genetic_algorithm. rb from the 
command line: ruby genetic_algorithm.rb 

Ruby scripts are written in ASCII text and are parsed and executed 
in a linear manner (top to bottom). A script can define functionality 
(as modules, functions, and classes) and invoke functionality (such as 
calling a function). 
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Comments in Ruby are defined by a # character, after which the 
remainder of the line is ignored. The only exception is in strings, where 
the character can have a special meaning. 

The Ruby interpreter can be used in an interactive manner by typing 
out a Ruby script directly. This can be useful for testing specific behavior. 
For example, it is encouraged that you open the Ruby interpreter and 
follow along this guide by typing out the examples. The Ruby interpreter 
can be opened from the command line by typing irb and exited again 
by typing exit from within the interpreter. 

A. 2. 2 Variables 

A variable holds a piece of information such as an integer, a scalar, 
boolean, or a string. 

a = 1 # a holds the integer value '1 ' 

b = 2.2 # b holds the floating point value '2.2' 

c = false # c holds the boolean value false 

d = "hello, world" # d holds the string value 'hello, world' 

Ruby has a number of different data types (such as numbers and 
strings) although it does not enforce the type safety of variables. Instead 
it uses 'duck typing', where as long as the value of a variable responds 
appropriately to messages it receives, the interpreter is happy. 

Strings can be constructed from static text as well as the values of 
variables. The following example defines a variable and then defines a 
string that contains the variable. The #{} is a special sequence that 
informs the interrupter to evaluate the contents of inside the brackets, 
in this case to evaluate the variable n, which happens to be assigned 
the value 55. 

n = 55 # an integer 

s = "The number is: #{n}" # => The number is: 55 

The values of variables can be compared using the == for equality 
and != for inequality. The following provides an example of testing the 
equality of two variables and assigning the boolean (true or false) result 
to a third variable. 

a = 1 
b = 2 

c = (a == b) # false 

Ruby supports the classical && and I I for AND and OR, but it also 
supports the and and or keywords themselves. 

a = 1 
b = 2 

c = a==l and b==2 # true 
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A.2.3 Flow Control 

A script is a sequence of statements that invoke pre-defined functionality. 
There are structures for manipulating the flow of control within the 
script, such as conditional statements and loops. 

Conditional statements can take the traditional forms of if condition 
then action, with the standard variants of if-then-else and if-then-elseif. 
For example: 

a = l 

b = 2 

if (a == b) 

a += 1 # equivalent to a = a + 1 
elsif a == 1 # brackets around conditions are optional 

a = 1 # this line is executed 
else 

a = 0 
end 

Conditional statements can also be added to the end of statements. 
For example, a variable can be assigned a value only if a condition holds, 
defined all on one line. 

a = 2 

b = 99 if a == 2 # b => 99 

Loops allow a set of statements to be repeatedly executed until a 
condition is met or while a condition is not met. 

a = o 

while a < 10 # condition before the statements 

puts a += 1 
end 



b = 10 
begin 

puts b -= 1 

end until b==0 # condition after the statements 

As with the if conditions, the loops can be added to the end of 
statements allowing a loop on a single line. 

a = o 

puts a += 1 while a<10 



A. 2. 4 Arrays and Hashs 

An array is a linear collection of variables and can be defined by creating 
a new Array object. 

a = [] # define a new array implicitly 

a = Array. new # explicitly create a new array 

a = Array .new(10) # create a new array with space for 10 items 
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The contents of an array can be accessed by the index of the element. 

a = [1, 2, 3] # inline declaration and definition of an array 
b = a[0] # first element, equivalent to a. first 

Arrays are also not fix-sized and elements can be added and deleted 
dynamically. 

a = [1, 2, 3] # inline declaration and definition of an array 
a « 4 # => [1, 2, 3, 4] 

a.delete_at(0) # => returns 1, a is now [2, 3, 4] 

A hash is an associative array, where values can be stored and 
accessed using a key. A key can be an object (such as a string) or a 
symbol. 

h = {} # empty hash 
h = Hash. new 

h = {"A"=>1, "B"=>2} # string keys 
a = h["A"] # => 1 



h = {:a=>l, :b=>2} # label keys 
a = h[:a] # => 1 

h[:c] = 3 # add new key-value combination 
h[:d] # => nil as there is no value 



A. 2. 5 Functions and Blocks 

The puts function can be used to write a line to the console. 

putsO'Testing 1, 2, 3") # => Testing 1, 2, 3 

puts "Testing 4, 5, 6" # note brackets are not required for the function 
call 



Functions allow a program to be partitioned into discrete actions 
and pre-defined and reusable. The following is an example of a simple 
function. 

def test_f unctionQ 

puts "Test!" 
end 

puts test_f unction # => Test! 



A function can take a list of variables called function arguments. 

def test_f unction(a) 

puts "Test: #{a}" 
end 

puts test_f unction( "me" ) # => Test: me 
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Function arguments can have default values, meaning that if the 
argument is not provided in a call to the function, the default is used. 

def test_function(a="me") 

puts "Test: #{a}" 
end 

puts test_f unction O # => Test: me 

puts test_function("you") # => Test: you 



A function can return a variable, called a return value. 

def square (x) 

return x**2 # note the ** is a power-of operator in Ruby 
end 

puts square (3) # => 9 



A block is a collection of statements that can be treated as a single 
unit. A block can be provided to a function and it can be provided with 
parameters. A block can be defined using curly brackets {} or the do 
and end keywords. Parameters to a block are signified by I var I . 

The following example shows an array with a block passed to the 
constructor of the Array object that accepts a parameter of the current 
array index being initialized and returns the value with which to initialize 
the array. 



b = Array .new(10) 


{ 1 i 1 i} # define a new array initialized 0..9 


# do. . . end block 




b = Array .new(10) 


do |i| # => [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] 


i * i 




end 





Everything is an object in Ruby, even numbers, and as such every- 
thing has some behaviors defined. For example, an integer has a .times 
function that can be called that takes a block as a parameter, executing 
the block the integer number of times. 



10. times { I i I puts i} # prints 0..9 each on a new line 



A. 3 Ruby Idioms 

There are standard patterns for performing certain tasks in Ruby, such 
as assignment and enumerating. This section presents the common 
Ruby idioms used throughout the code examples in this book. 



434 



A. Ruby: Quick-Start Guide 



A. 3.1 Assignment 

Assignment is the definition of variables (setting a variable to a value). 
Ruby allows mass assignment, for example, multiple variables can be 
assigned to respective values on a single line. 

a,b,c = 1,2,3 

Ruby also has special support for arrays, where variables can be mass- 
assigned from the values in an array. This can be useful if a function 
returns an array of values which are mass assigned to a collection of 
variables. 

a, b, c = [1, 2, 3] 

def get_min_max (vector) 

return [vector .min, vector. max] 
end 

v = [1,2,3,4,5] 

min, max = get_min_max(v) # => 1, 5 



A. 3. 2 Enumerating 

Those collections that are enumerable, such as arrays, provide convenient 
functions for visiting each value in the collection. A very common idiom 
is the use of the . each and . each_with_index functions on a collection 
which accepts a block. These functions are typically used with an in-line 
block {} so that they fit onto one line. 



[1,2,3,4,5] .each {|v| 


puts v} # in-line block 


# a do. . . end block 




[1,2,3,4,5]. each.with. 


.index do |v,i| 


puts "#{1} = #{v}" 




end 





The sort function is a very heavily-used enumeration function. It 
returns a copy of the collection that is sorted. 



a = [3, 2, 4, 1] 

a = a. sort # => [1, 2, 3, 4] 

There are a few versions of the sort function including a version that 
takes a block. This version of the sort function can be used to sort the 
variables in the collection using something other than the actual direct 
values in the array. This is heavily used in code examples to sort arrays 
of hash maps by a particular key-value pair. The <=> operator is used 
to compare two values together, returning a -1, 0, or 1 if the first value 
is smaller, the same, or larger than the second. 
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a = { : quality=>2 , :quality=>3, :quality=>l} 

a = a. sort {|x,y| x [ :quality] <=>y [ tquality] } # => ordered by quality 



A. 3. 3 Function Names 

Given that everything is an object, executing a function on a object 
(a behavior) can be thought of as sending a message to that object. 
For some messages sent to objects, there is a convention to adjust the 
function name accordingly. For example, functions that ask a question 
of an object (return a boolean) have a question mark (?) on the end of 
the function name. Those functions that change the internal state of an 
object (its data) have an exclamation mark on the end (!). When working 
with an imperative script (a script without objects) this convention 
applies to the data provided as function arguments. 



def is_rich? (amount) 


return amount >= 


1000 


end 




puts is_rich?(99) 


# => false 


def square_vector ! 


(vector) 


vector . each_with 


_index {|v,i| vector [i] = v**2} 


end 




v = [2,2] 




square_vector ! (v) 




puts v. inspect # = 


> U.4J 



A. 3. 4 Conclusions 

This quick-start guide has only scratched the surface of the Ruby Pro- 
gramming Language. Please refer to one of the referenced textbooks on 
the language for a more detailed introduction to this powerful and fun 
programming language [1, 2]. 
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